Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeseries object for Astropy (APE9) #12

Open
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
@Cadair
Copy link
Member

commented Mar 24, 2016

This is still in very early draft.

@abigailStev

This comment has been minimized.

Copy link

commented Mar 25, 2016

Pinging myself so I can follow development and discussion.

@ehsteve

This comment has been minimized.

Copy link

commented Mar 30, 2016

I am very interested in this. Thanks for putting this together @Cadair.

@ehsteve

This comment has been minimized.

Copy link

commented Mar 30, 2016

and thanks to everyone that participate in the discussion at PyAstro16 :P

Abstract
--------

The goal of a timeseries object in astropy is to provide a core set of

This comment has been minimized.

Copy link
@ehsteve

ehsteve Mar 30, 2016

remove the goal of sharing. that's a general astropy goal.

APE9.rst Outdated
functionality that can be shared between astropy, affiliated packages and the
wider community.

The scope of this timeseries object is designed to remain focused on the core

This comment has been minimized.

Copy link
@ehsteve

ehsteve Mar 30, 2016

Reword suggestion, the scope of this time series object is to provide the capability to support a unit and meta-data aware time series that supports basic analysis tasks relevant to the community. Such an object should also be able to easily export itself into other formats such as pandas series

This comment has been minimized.

Copy link
@ehsteve

ehsteve Mar 30, 2016

Should we mention performance? It should work well and quickly for large data sets (million entries) as well as small.

APE9.rst Outdated
#. Re-binning and re-sampling timeseries
#. Interpolating to different time stamps.
#. Masking
#. Support for subtraction and addition.

This comment has been minimized.

Copy link
@ehsteve

ehsteve Mar 30, 2016

maybe add (e.g. background)

``QTable`` class, with a few requirements specific to the timeseries class:

#. A 'Time' index column exists which enforces unique indexes.
#. The table is always sorted in terms of increasing time.

This comment has been minimized.

Copy link
@ehsteve

ehsteve Mar 30, 2016

if this is true then it does not need to provide sorting capability.

This comment has been minimized.

Copy link
@Cadair

Cadair Mar 30, 2016

Author Member

it needs to be able to sort itself, the list above is not end-user functionality.

This comment has been minimized.

Copy link
@dstansby

dstansby Jun 27, 2018

Why does the table always have to be sorted by time? Why shouldn't a user be able to sort by another non-index column?

#. The table is always sorted in terms of increasing time.


Binned Data vs Sampled Data

This comment has been minimized.

Copy link
@ehsteve

ehsteve Mar 30, 2016

I like calling out this distinction!

@ehsteve

This comment has been minimized.

Copy link

commented Mar 30, 2016

I think it would be helpful to list a number of examples of analyses that we would want to perform on these.

@eteq eteq changed the title Timeseries object for Astropy Timeseries object for Astropy (APE9) Aug 22, 2016

@eteq eteq force-pushed the astropy:master branch from f37e72e to 012787a Aug 22, 2016

@duncanmmacleod

This comment has been minimized.

Copy link

commented May 12, 2017

I am interested in this feature, and think it would be a great addition. GWpy would likely move to using/inheriting this object, rather than the current gwpy.timeseries.TimeSeries (which inherits from astropy.units.Quantity in the first place).
I'm happy to help with development and testing as desired.

@abigailStev

This comment has been minimized.

Copy link

commented May 12, 2017

I am in support of this APE! And I'm available to help with testing.

@eteq

This comment has been minimized.

Copy link
Member

commented May 12, 2017

I like it and would use it.

@eaydin

This comment has been minimized.

Copy link

commented May 12, 2017

I love it!

@abigailStev

This comment has been minimized.

Copy link

commented May 12, 2017

I would like for @StingraySoftware to move towards using a TimeSeries object (or at least basing our Lightcurve class on it) for lightcurves.

APE9.rst Outdated

This APE proposes that we place the following restrictions on binned data:

#. Contiguious bins, i.e. the start of the i+1th bin is the end of the ith bin.

This comment has been minimized.

Copy link
@SolarDrew

SolarDrew May 14, 2017

I think we decided this was not going to be necessary.

Many different areas of astrophysics have to deal with 1D timeseries data, i.e.
either sampling a continuous variable at fixed times or counting some events
binned into time windows. These types of applications require some basic
functionality:

This comment has been minimized.

Copy link
@SolarDrew

SolarDrew May 14, 2017

How many of these features already exist in QTable?

This comment has been minimized.

Copy link
@Cadair

Cadair Nov 10, 2017

Author Member

Some, but all would need to be customised for timeseries data (i.e. to preserve the index correctly.)

Cadair added some commits Mar 24, 2016

@Cadair Cadair force-pushed the Cadair:master branch from 2102d4d to fe853e7 Nov 10, 2017

address comments and add some API examples
Mainly examples on the difference in construction between binned and sampled data.
APE9.rst Outdated
functionality:

#. Extending timeseries with extra rows
#. Concatenating multiple timeseries objects

This comment has been minimized.

Copy link
@Cadair

Cadair May 1, 2018

Author Member

Re-word to clarify that concatenation is extra columns.


Initialize a ``SampledTimeSeries`` with a time and a data column::

ts = SampledTimeSeries(time=['2016-03-22T12:30:31', '2016-03-22T12:30:32', '2016-03-22T12:30:40'],

This comment has been minimized.

Copy link
@Cadair

Cadair May 1, 2018

Author Member

Should we be able to construct from a start time and a list of "deltas"?

API. Some examples of constructing these objects is given below.


Initialize a ``SampledTimeSeries`` with a time and a data column::

This comment has been minimized.

Copy link
@Cadair

Cadair May 1, 2018

Author Member

This section should also demonstrate constructing these objects with Time and TimeDelta so that it's clear that's a thing.

APE9.rst Outdated
#. Slicing / selecting time ranges (indexing)
#. Re-binning and re-sampling timeseries
#. Interpolating to different time stamps.
#. Masking

This comment has been minimized.

Copy link
@Cadair

Cadair May 1, 2018

Author Member

and missing value (NaN)

@astrofrog

This comment has been minimized.

Copy link
Member

commented Nov 5, 2018

@duncanmmacleod @abigailStev - currently the example implementation I linked to above allows you to do for example:

ts = SampledTimeSeries(time='2016-03-22T12:30:31', time_delta=3 * u.s, n_samples=1000)

The idea is that currently this just computes all the times, but in future we could make it so that the time column is dynamically calculated for efficiency, but this would just happen behind the scenes 'magically'. So this is what I mean by not actually needing a new class for that use case.

@abigailStev

This comment has been minimized.

Copy link

commented Nov 5, 2018

@astrofrog Ok! Yeah the idea is that since they're regularly sampled (and often quite long for my applications, like, 5e5 time bins in one segment), we don't want to always drag around the time column in addition to the 'counts' or 'rate' column.

@astrofrog

This comment has been minimized.

Copy link
Member

commented Nov 5, 2018

@abigailStev - that makes sense! Just out of curiosity, are your time series actually of binned photon events, and if so, is it important from your point of view to distinguish between the sampled and binned time series? (or would you actually treat them the same?)

@mhvk

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2018

@astrofrog - the question of separate or the same class is one of those questions that I fear can only be answered by doing it... But I think two classes makes more sense. If you do go that route, I think the binned time series should indeed have start_time, but unlike in your text, I think the time column should by default evaluate to the middle of the bin, not the start. That would make it much more useful, e.g., for a radial-velocity study based on fairly long spectroscopic observations.

Agreed that regularly sampled data should be part of this, perhaps even a subclass since it allows some handy speedups. But at some level those are implementation details. The text seems a little short on specific methods, etc., that you'd hope to provide.

@dstansby

This comment has been minimized.

Copy link

commented Nov 5, 2018

I am +1 for option A, ie. having separate classes for binned and sampled data. I think the is much more clear than option B. If in practice bin size is too small to care about, then a user is free to use a SampledTimeSeries, but I think it is good to force them to make the choice, which in turn will (maybe) force them to consider if it is the appropriate thing to do for a particular dataset.

@barentsen

This comment has been minimized.

Copy link

commented Nov 5, 2018

My +1 goes to a single TimeSeries object for three reasons:

  1. A single object is easier for new users; no need to read up about intricate differences between two classes to get started.
  2. I can't think of any significant differences between the methods offered by SampledTimeSeries and BinnedTimeSeries. (But maybe someone else can?)
  3. As Tom points out, many sampled time series are effectively binned by the detector across some exposure time. It seems sensible for every TimeSeries object to be explicit about this.

Just my two cents. (This is an exciting APE, thanks everyone!!)

@abigailStev

This comment has been minimized.

Copy link

commented Nov 5, 2018

@astrofrog Binned time series. I very much agree with @barentsen's point 3 -- this is true for us, and then we often bin up the time series to have larger time bins than the intrinsic time resolution of the data.

@mhvk

This comment has been minimized.

Copy link
Contributor

commented Nov 5, 2018

I think there is a real difference between a time series of events/counts such as one gets from Fermi and a stack of exposures. They also imply rather different strategies for binning, fourier analysis, etc. Another very different type of time series is what one gets from a bolometer scan (think, south pole telescope; WMAP; ...).

Generally, it may be good to think about providing a good match to existing time series formats, such as Fermi/Chandra even lists and the typical operations one does on those (hence, cc @taldcroft, @cdeil).

One particular feature that I would look for in event lists is the ability to easily bin in multiple columns, e.g., to construct an image by binning in X, Y, a spectrum by binning in energy, a phase lightcurve by binning in phase (calculated from the times, of course). The relevant numpy routing is bincount, but it would need to be multidimensional. Quite possibly, gammapy already does this...

@cdeil

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

The design questions you mention, e.g. how to support different kinds of TimeSeries, are tricky. I don't think there is a universal and best answer, each of the options you are considering are valid and work well for one set of use cases, and not so well for others.

I'm not sure if I have useful input here, but I'll quickly describe what we have in Gammapy in case it helps.

In Gammapy we used to sub-class astropy.table.Table for example for EventList and LightCurve, but by now changed those classes to wrap an astropy.table.Table, preferring "composition over inheritance". I.e. basically we do this:

class LightCurve:
    def __init__(self, table):
        self.table = table

and then we have some properties and methods on those classes that compute derived quantities or create new objects. We expose the table data member as a public property, it's not an implementation detail. Basically we treat the objects as read-only, sometimes we use cached properties. We don't need a lot of operations relating to time that would require a table index or groupby or database options like append row. We always have full time column, we don't need the special cases discussed above of regularly sampled time which gives a bit of performance benefits. (Do the others really need this, is it worth to re-implement that part of pandas in Astropy?) To support sampled we use tables with a time column, and for binned with time_min and time_max, with time_delta as a property computed as time_max - time_min. Some of the methods will only work with one or the other, and in those cases they error out, e.g. if you access the time_delta property on a LightCurve that doesn't have time_min and time_max columns.

I'm not sure about this design to put sampled and binned in a single class. E.g. in https://sherpa.readthedocs.io/en/latest/data/index.html#unbinned-data you see that they have separate classes for these two cases (there called Data1D and Data1DInt). Then you'll get a more tailored API for each case, but also some code duplication, and some users might want to have both time and time_min / time_max columns. e.g. in gamma-ray astronomy you often get that: you measure between time_min and time_max, but then plot and use in further analysis as if the measurement was taken at a single time, i.e. people put all three columns where measuring a lightcurve.

I see that in your prototyping and description here you plan to sub-class QTable. My gripe with it is that I find it difficult (both from a developer and user perspective) what to put in the methods that are overwritten in the subclass, especially __init__ and read and write and a few others like repr. One has to mix in the QTable and TimeSeries in those methods in a complex way. I think either design is possible, each with it's own advantages and problems. Especially if you plan to implement many fancy ways to initialise different kinds of time index columns, you could consider taking a close look at how pandas does this. My understanding is that basically they achieve this by having different special time index and range classes, and then use the one DataFrame class, instead of creating different kinds of DataFrame classes like BinnedTimeSeries etc. If there's really enough motivation to implement this (several 1000 lines of code?), maybe consider to copy the pandas design to introduce special time index classes and re-do it with Astropy Table and Time objects? Maybe it's even possible to wrap the pandas time index classes and have the implementation based on this, i.e. wrap instead of re-implement?

In http://docs.lightkurve.org/_modules/lightkurve/lightcurve.html#LightCurve I see that they have a design that's not based on Table, but just tailored to a small fixed sub-set of allowed columns like time, flux and flux_err each stored as a separate Numpy array. I think basing the TimeSeries design on Table like you do here is better, because it's more flexible. E.g. many users will have e.g. a systematic error for each measurement or other kinds of quantities, and then in a Table or DataFrame those can be extra columns, whereas with lightcurve you can't use that class.

@hamogu

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

A comment on eventlists raised by @mhvk: In typical X-ray analysis, eventlists and lightcurves are different things. Taking Chandra as an example, ACIS has a read-out time of 3.2 so you get a list of detected photons with a time resolution of 3.2 s, and each photon essentially has a time, an energy and a position. These event lists can then be binned on time to create a lightcurve. In principle, one could do a lightcurve with 3.2 s time bins, but that would (for typical objects) give you many bins with 0 photons, a few with 1, and even less with 2 or more photons. For many applications that's not very useful, e.g. finding flares, looking for periods, so instead we make lightcurves that have e.g. 30 min in each bin. This would then be a sampled time series, but in practice is often used like a binned time series, e.g. people run a Lomb-Scargle periodogram using the mid-point of the bin.

@abigailStev : I know there are other observing modes for brighter objects with more counts, but the vast majority of Chandra or XMM observations uses "full frame" imaging. I'm all for a base class that can be used for everything, I'm just saying that even from the X-ray perspective, lightcurves that are so long that the time column needs to be computed on-the-fly to save memory are rare (meaning not required by most users).

@hamogu

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

My thoughts on @astrofrog original question: I agree with @barentsen and would prefer a single TimeSeries class.
From the practical point, I don't see many functions that treat the two cases separately yet, e.g. if we had a function for LombScarglePeriodogramForSampledData and LombScarglePeriodigramForBinnedData then that would make sense.(*) In practice, almost all the operations I currently do on lightcurves ignore the difference. In some cases, that's mathematically correct, in others (e.g. fitting a function to the shape of the lightcurve) it's not correct, but for lightcurves with relatively small bins the difference between sampled or binned is much smaller than the statistical uncertainty anyway.

I expect that those people who work in fields where the difference between samples and binned matters already know what they are doing and can set a flag DoThisToLightcurve(lightcurve, binned=True) or select the right function when needed.

*: Actually, I only think it makes sense to differentiate once there are dozens of functions that work differently on binned and sampled lightcurves and I just don't think that that's going to be available outside of very specialized packages - and those packages can subclass the astropy TimeSeries into subclasses themselves if they need to.

@mhvk

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2018

If one does go for a single class, please be sure that time then is the centre of the bin, not the start; that will avoid countless silly errors...

@barentsen

This comment has been minimized.

Copy link

commented Nov 6, 2018

Serious question: on the stock markets, do financial analysts use different tools for transactional data (photon events) and historical stock prices (lightcurves)? :-)

I imagine a case could be made for a separate PhotonEventList class, which might offer several specialized methods that are not applicable to a generic TimeSeries class, e.g. extract_image(time_range=..., energy_range=...) and extract_lightcurve(energy_range=...).

The design of such a class should be driven by real use cases of course, i.e. common Chandra/Fermi/XMM analysis tasks.

@astrofrog

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

Thanks everyone for your feedback so far! Just a few notes based on an in-person discussion with @Cadair and @SolarDrew today:

  • Regarding specialized classes, such as photon event lists: as for NDData, any specialized class should be a sub-class of the base classes rather than trying to fit all functionality into the base classes. For example, a PhotonEventList (which IMHO is a time series) could have specific methods for binning in space or time (as mentioned by @barentsen and @mhvk). In fact, we are thinking of moving the existing downsampling and folding methods from the base classes to functions to keep the base classes as agnostic as possible about how to do various operations.

  • We can see arguments for one and for two base classes. On one hand users might find it easier to just use a single class, but there are conveniences (such as properties to get the start/end time of bins or the bin width) that would justify having a separate class for binned time series. So our new proposal is to have a TimeSeries class which are what we were calling SampledTimeSeries before, and to then have BinnedTimeSeries that is a sub-class of TimeSeries that adds a column for the bin size, and convenience properties for the start/end time of each bin. In this way, any function that works with a TimeSeries would work with a BinnedTimeSeries by default and could choose to do something extra because of the binning if they want (but wouldn't require re-writing all functions twice as @hamogu pointed out). A consequence of this is that in binned time series, time would indeed refer to the center of the bin (alleviating @mhvk's concerns).

  • We think it still makes sense to sub-class Table/QTable, rather than wrap them as @cdeil suggests. Time series objects really are tables and all the existing methods on tables should apply. In our preliminary implementation, we haven't really run into anything that needs to be disabled because it isn't applicable.

Let us know if you have any thoughts about this, and in particular the new proposal for the TimeSeries and BinnedTimeSeries classes.

@hamogu

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

@astrofrog A PhotonEventList differs from a standard one-band TimeSeries in that there could be many entries with the same time (if more than one photon was detected in those 3.2 s), but at different x,y detector coordinates. In a PhotonEventList thus Time is not a unique index and "sorting in time" can be ambiguous. This is getting into High-energy astronomy specifically, but in my view, a PhotonEventList is just a table, which can then be binned in time (typically with some selection criteria on energy or x,y position) to produce a TimeSeries.

@cdeil

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

In a similar direction to what I think @hamogu said above, I'm also not sure if people need or would want to represent photon event lists as TimeSeries objects. At least in gamma-ray astronomy, and I think probably generally, the time-related operations done on event lists are very limited and simple and it's OK to just use Table. The two main operations are filtering the rows of events with times in given "good time intervals" (again represented by a table of start / stop times) and binning events, which is basically a call to numpy, I'm not sure if I would want to create TimeSeries objects for that.

The current APE doesn't mention use cases yet that you want to address with these classes. Can you add some info on that? Is it mostly about light curves or are there other things also?

We think it still makes sense to sub-class Table/QTable, rather than wrap them as @cdeil suggests. Time series objects really are tables and all the existing methods on tables should apply. In our preliminary implementation, we haven't really run into anything that needs to be disabled because it isn't applicable.

The biggest difference in your current implementation is __init__. Looking at the current APE here I see many examples how to create TimeSeries with one time and one data column. Is it possible to create TimeSeries with multiple data columns (like flux and flux_error for example) or if I already have a Table object with a time column? I think explaining this or giving an example in the APE would be very useful. If __init__ becomes too complex you could add alternative constructors like from_table, or move the current __init__ to a from_something and keep the main __init__ for creation from Table.


So our new proposal is to have a TimeSeries class which are what we were calling SampledTimeSeries before, and to then have BinnedTimeSeries that is a sub-class of TimeSeries that adds a column for the bin size, and convenience properties for the start/end time of each bin. In this way, any function that works with a TimeSeries would work with a BinnedTimeSeries by default and could choose to do something extra because of the binning if they want (but wouldn't require re-writing all functions twice as @hamogu pointed out). A consequence of this is that in binned time series, time would indeed refer to the center of the bin (alleviating @mhvk's concerns).

I think this is probably the best solution if you want a BinnedTimeSeries to be a TimeSeries. This is also mentioned in an OGIP memo from the 90s here, they called the extra column TIMEDEL.

There will still be many users that have (TSTART, TSTOP) instead of (TIME, TIMEDEL), and probably also most time interval based methods / algorithms are (TSTART, TSTOP) based (see https://github.com/chaimleib/intervaltree for examples). So you need to be clear about time sorting requirements, and could also consider supporting BinnedTimeSeries creation directly from TSTART and TSTOP, and making those cached properties for efficiency reasons, if you imagine the time series to be a read-only data structure. This really depends on what use cases and algorithms you have in mind for BinnedTimeSeries, if efficiency is not a concern of course it's best to avoid caching and to have a simpler implementation and to allow updating time-related values in the table any time (not sure it there's really use cases to do this, usually data is given and fixed).


In the APE I see interpolation as a feature you want to add. For that you could consider using scipy.interpolate.InterpolatedUnivariateSpline. I did that here, in my case not really because it's a flexible and good interpolation method, but because it has an integral method that is useful to get the mean flux for a given time interval.

@dstansby

This comment has been minimized.

Copy link

commented Nov 6, 2018

@astrofrog A PhotonEventList differs from a standard one-band TimeSeries in that there could be many entries with the same time (if more than one photon was detected in those 3.2 s), but at different x,y detector coordinates. In a PhotonEventList thus Time is not a unique index and "sorting in time" can be ambiguous. This is getting into High-energy astronomy specifically, but in my view, a PhotonEventList is just a table, which can then be binned in time (typically with some selection criteria on energy or x,y position) to produce a TimeSeries.

More generally, I think you're describing something that is a function of time and one or more other variables (in this case spatial). This is also common in heliospheric physics when we measure the velocity distribution function of particles; measurements are counts as a function of time, speed, and two angular coordinates (f(t, v, theta, phi)). Support for some sort of pandas or xarray multi-indexing would be really great to have, but is probably another project altogether unless Table already has that feature.

@astrofrog

This comment has been minimized.

Copy link
Member

commented Nov 7, 2018

@hamogu @cdeil @dstansby - thanks for your comments! Just a few notes below:

In a similar direction to what I think @hamogu said above, I'm also not sure if people need or would want to represent photon event lists as TimeSeries objects.

That's fine - I hadn't appreciated the non-unique time of event lists. I guess what I meant is just that when people are talking about domain-specific functionality here, this could be included in subclasses of TimeSeries if TimeSeries is the appropriate class to subclass from.

The current APE doesn't mention use cases yet that you want to address with these classes. Can you add some info on that? Is it mostly about light curves or are there other things also?

This should be clarified - I think we can be broader than just lightcurves, for instance radial velocity curves or gravitational wave time series. It doesn't mean we can't envisage having subclasses for each of these types though (but the idea here is to start very general)

The biggest difference in your current implementation is init. Looking at the current APE here I see many examples how to create TimeSeries with one time and one data column. Is it possible to create TimeSeries with multiple data columns (like flux and flux_error for example) or if I already have a Table object with a time column?

In the current implementation I'm working on, yes. I agree we should expand on this in the APE.

I think explaining this or giving an example in the APE would be very useful. If init becomes too complex you could add alternative constructors like from_table, or move the current init to a from_something and keep the main init for creation from Table.

That's a good idea, I'll see what would make sense with the current implementation.

@abigailStev

This comment has been minimized.

Copy link

commented Nov 7, 2018

In terms of event lists: while this is how my data is produced, I use these events (in a Table) to populate a light curve with even time sampling then do my analysis with those light curves. So, an EventList-type object wouldn't get much use from me (and others in X-ray [spectral-]timing).

@hamogu: I wouldn't discount the entirety of X-ray timing and spectral-timing. Data from RXTE, NICER, and Astrosat LAXPC (and in the future, eXTP and maybe STROBE-X) require the ability to handle long time series, even just for a single segment (like, 32 seconds of data) if it's finely binned.

Using time in the center of the bin (re: @mhvk) works for me as long as this is very clearly documented!! (knowing where in the bin your timestamp is has caused many a headache in the past)

I agree with @astrofrog about having broad use-cases like radial velocity and grav. wave time series! And I agree with @dstansby about having support for multiple dimensions (like spatial or energy information) in one time series segment (i.e., with the same timestamps) -- this would be super duper awesome.

@mwcraig

This comment has been minimized.

Copy link
Collaborator

commented Nov 9, 2018

I'm late to this, but have a couple things to add:

  • I like the TimeSeries/BinnedTimeSeries option more than the original Sampled... and Binned...
  • I'd propose (this may be a little extreme) that in the BinnedTimeSeries there not be a time property at all. Instead, I'd go with the more explicit start_time, end_time and mid_time (or something along those lines). It is a little more verbose but has the virtue of being very clear what time you were using without needing to track down the documentation.
@gbelanger

This comment has been minimized.

Copy link

commented Nov 10, 2018

Good morning,

I just read through this thread and find the discussion very useful and important. I have implemented, over the past 12 years, a bunch of packages for time domain analysis in Java, and have had to make all the choices that are being discussed above in terms of how to represent time-related data.

Because my background is high energy astrophysics, it was clear from the very start that I needed to have EventList and TimeSeries as different objects. As it was mentioned above, EvenList can have other characteristics in addition to arrivalTime such as x, y, energy, and a quality flag, for example. In the TimeSeries this is not the case. But there can be fundamentally different kinds of time series, like what I've called CodedMaskTimeSeries that carries a lot more information than just intensity as a function of time, because the time series is actually reconstructed from images, and the errors depend on the angle at which the source was in the field of view, and on and on.

So, I have a package for event lists and a package of time series, and in the time series package, I have an interface ITimeSeries, and abstract class AbstractTimeSeries, and two concrete classes TimeSeries, and CodedMaskTimeSeries going from the general to more specific, each class inheriting from the previous and each implementing the interface. I also have an interface ITimeSeriesFileReader with specific file readers that implement it for different kinds of input files (fits, ascii, qdp), such that the TimeSeriesReader just loops through to try each implemented type when given an input file. Similarly, there are several TimeSeriesFileWriters.

The similarity between an EventList and TimeSeries in terms of its temporal information carrying capacity is used in the factory classes that can handle time data: the TimeSeriesMaker and PeriodogramMaker. These have methods to deal with event lists and with time series as inputs. For the periodograms, it is again with an AbstractPeriodogram that is used as the base, and then different types that extend it (FFTPeriodogram, RayleighPeriodogram, LombScarglePeriodogram, ZPeriodogram, ModifiedRayleighPeriodgram, LikelihoodPeriodogram, as well as AggregatePeriodogram to combine periodograms and AveragePeriodogram to express the result, which is the only periodogram that has uncertainties on the powers, because they are not used in any other one since the power a each frequency in a periodogram is a point estimate.

Lastly, I have been recently re-writing the resampling of time series in order to make it more manageable, because dealing with resampling of a binned time series in a general way is very complicated. My time series are allowed to have both irregular sized bins and irregular spaces in between them, and everything is written to handle that. So, I have been working on decomposing the time series into a collection of intensity bins made up of (composition) an Intensity and a Bin, each of which carries specific information that defines everything we need to know. In this way, we can handle each IntensityBin one at a time. This is ongoing and I'm still sorting out difficulties like how to deal with different kinds of intensities (absolute quantity or density, for example) in a general manner.

But in conclusion, I think that firstly, having EventList and TimeSeries objects makes sense, because an EventList is not really a kind of TimeSeries even though its arrival times can be used to make one. And secondly, it is very important to look down the line and consider all the kinds of things that will be done with the EventList and TimeSeries (e.g. periodograms or resampling) so that it doesn't become too difficult to do them when we get there. Please let me know if you want more details about anything addressed in this comment. I'd be happy to help further. Unfortunately, I haven't yet started coding in python. Maybe it's the time to start ;) I also need to update my repos on GitHub which I promise to do soon.

@astrofrog

This comment has been minimized.

Copy link
Member

commented Nov 16, 2018

@gbelanger - thank you for your insight - it's always really helpful to know that people have gone through similar design questions in the past and see how they have solved them. I think it would be really helpful if you take a look at what we have so far (once ready to try out) and see if you have any suggestions (and it would be great if you were interested in contributing!)

@astrofrog

This comment has been minimized.

Copy link
Member

commented Nov 16, 2018

I've now had a chance to digest all the suggestions above and have now updated the prototype implementation. I've also now put the implementation in a standalone package to make it easy to try out. The repository and docs are here:

:octocat: https://github.com/aperiosoftware/astropy-timeseries
📖 https://astropy-timeseries.readthedocs.io/en/latest/

It would be really helpful if people here can take it for a spin to see if you like the current API. The way I've set up this package is such that once we open a PR to astropy core, all commits will be preserved. Therefore, feel free to open pull requests, or issues! If you'd like to work on some of the open issues, just leave a comment!

You can easily install the package with:

pip install git+https://github.com/aperiosoftware/astropy-timeseries.git

and note that things will work better if you have the latest developer version of Astropy (or the 3.1rc1).

A few updates based on the above discussion:

  • I've decided to stick with separate classes for the binned and sampled time series for now, but calling the sampled one TimeSeries rather than SampledTimeSeries

  • TimeSeries has a time attribute, while BinnedTimeSeries instead has time_bin_start, time_bin_center, time_bin_end, time_bin_size attributes to make sure everything is explicit.

  • I've moved some of the functionality (e.g. the downsampling) out of the base class since the implementation that's there may not be appropriate for all cases. In general I think the base time series classes should be as generic as possible, and we or affiliated packages can then implement sub-classes if needed.

  • I don't think we need to take a stance on event lists - if people want to use TimeSeries for event lists, they are free to do so, but some packages may want to implement their own specialized event list class.

Let me know if you have any feedback!

@StuartLittlefair

This comment has been minimized.

Copy link

commented Nov 18, 2018

I'm a little late to this party, but having looked at and tried the prototype I have a suggestion which might make the TimeSeries class more powerful. Unfortunately I think it would mean a quite different implementation.

If I imagine my use cases for a class, it is to handle collections of data, which have a matching time attribute. Typically, data has values, uncertainty, and perhaps a mask indicating data quality - just like the NDData class.

It would also be incredibly powerful if arithmetic on TimeSeries objects performed arithmetic on matching "columns", whilst handling the uncertainties and masks automatically. Showing, an example, wouldn't it be great if something like this worked...

flux_1 = NDDataArray(data=y_1, uncertainty=StdDevUncertainty(yerr_1), flags=flags_1) 
flux_2 = NDDataArray(data=y_2, uncertainty=StdDevUncertainty(yerr_2), flags=flags_2) 
ts_1 = TimeSeries(time=t, data={'flux': flux_1})
ts_2 = TimeSeries(time=t, data={'flux': flux_2})
rel_ts = ts_1/ts_2

...and the rel_ts object also had a flux entry which had the correct values, uncertainty and mask? Users could subclass NDDataRef to get have the flags combine anyway they'd like.

Such behaviour is compatible with the APE as it stands, but can't be achieved if a TimeSeries is a subclass of Table, since they don't support having NDDataArray as a column, as far as I'm aware. One would lose the ability to join, stack etc - but I'd argue that the functionality above is more powerful for many users. I've posted a highly incomplete prototype to illustrate how it might be implemented here.

Sorry if this suggestion comes after many people have done a lot of hard work; I hadn't realised this issue until I started to play with the implementation. Feel free to ignore this, or point out glaring errors

@astrofrog

This comment has been minimized.

Copy link
Member

commented Nov 19, 2018

@StuartLittlefair - thanks for the suggestion! I agree that using NDData might work well for some use cases. I'd like to make two suggestions:

  • In specutils, we made a spectral mix-in class that can be used to make one of the NDData dimensions be a spectral axis. We could do the same for time, i.e. make a mix-in class that provides an easy way to set up one of the axes of an NDData cube as a time axis. Note that this would then also allow one to set up cubes with two spatial, one spectral, and one time axis. To me, this is complementary to the Table approach - in some cases people want to think of time series as being simple tables, and in others people might want to think of it as an n-dimensional cube (where n>=1) where one axis is time. (aperiosoftware/astropy-timeseries#16)

  • There's no reason we can't make NDDataArray be used as a mix-in column in tables - in fact I'll open an issue about that now! (astropy/astropy#8159)

I think both of these can be done of course, we don't have to choose.

I'll also mention that for NDData, we changed the API a few times because we ended up making the base class try and be too smart, and some functionality is best left to sub-classes. I think we will need to do the same here - that is, functionality like rel_ts = ts_1/ts_2 makes too many assumptions to belong in a base time series class. However, we could always develop a sub-class that behaves in the way you indicate, as a convenience layer.

I'll try and work on an NDData mix-in class implementation to see how much work it might be and whether it would suit your use case. I'll also see if I can re-use some of your prototype code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.