New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop the next version of the LightCurve API #1520
Comments
Disagree! Pandas is now standard in the scientific python stack so it is as much of a "dependency" as numpy, scipy, or matplotlib. Also it provides much more functionality than astropy tables do (and will continue to get more capabilities over time). I am strongly against this. |
I agree that it would be great for Pandas to support units. Perhaps we can hack that in somehow. |
@ehsteve We would loose no functionality by doing this, as we would still provide access to the dataframe. This is about making it easier for us to maintain the underlying code. The dependency point is the least important of the ones I made above, although it would still make certain things easier. I do not think that "hacking" units onto pandas is ever going to be a good idea, as it will cause us much more work, rather than less, which is my main motivator. Also if we are going to move to astropy time in all other places in the code base, having this as astropy table will make it much easier to work with this. I think we need to think about the functionality of pandas that we wrap, i.e. not enable users to use by
these three core things that we need (and we don't even use the last two in the current implementation) can all be provided by |
To evidence my statement that it would make things easier for us if done properly, I would like to hold up @DanRyanIrish's GOES code: See things like this: On the topic of time consider our future need for epoch conversions, and going between TAI time and UTC for JSOC, or downloading lightcurve data that has the time column in "utime" and the user wanting to then download data from JSOC based on the peak intensity in a LightCurve, in our current model that would involve doing a utime -> UTC -> TAI conversion (and loosing accuracy at each step), where as if we could use Astropy Table, with an Astropy Time index column, we could do utime -> TAI time, only when passed into the These two things are just to highlight how using more of the Astropy infrastructure should make our lives simpler. |
I also agree that pandas is great, and to clarify I think that having it as a dependency is not a practical problem at all these days. |
I was unable to make the dev meeting last night so perhaps you can tell me if a decision was made on this. I see real advantages to switching to astropy tables, as @Cadair showed with my GOES code. Although pandas has some great functionality, is there anything we are likely to want to do with LightCurve data that pandas would make far easier/better than astropy tables? |
We did not discuss this at the dev meeting, we had lively debates about other things, like the nature of an example gallery. I am with @DanRyanIrish on this, I have pointed out a couple of things that Astropy tables would improve over pandas, but I am yet to see any comments about what Pandas is bringing to the table that we need to implement the LightCurve object. |
The only question I can think to ask is is accessing and manipulated a pandas dataframe significantly faster than for an astropy table? If people want to process a lot of data, e.g. years of GOES data or a lot of high cadence LYRA data, that might be something to consider if we want people to use the LightCurve object rather than reading FITS data straight into their own numpy arrays. |
@DanRyanIrish That's an interesting question, and something that's worth investigating later on. |
Probably worth while to come up with a list of operations you would want to So, to kick things off, here are a list of operations and use cases. truncation
merging
subsampling
resampling
summing
sorting
On Tue, Aug 4, 2015 at 4:28 AM, Stuart Mumford notifications@github.com
|
@wafels thanks, that's great. Are you suggesting that we make all that functionality available through the LightCurve API? |
In answer to @Cadair I would vote yes on that. |
I don't know. What would be the alternative - a bunch of independent lightcurve.operation.truncate(my_lc, "2010-01-01", "2010-01-02 12:34:58") lightcurve.operation.sum(my_lc, bins=60*u.s) lightcurve.operation.merge(my_lc1, my_lc2, overlap=np.mean) On Tue, Aug 4, 2015 at 9:49 AM, Steven Christe notifications@github.com
|
I think all of that functionality is reasonable to be in the API for lightcurve. I think this is a good way to define this choice of underlying data object, in that if it turns out astropy table can not perform all the things we need it to, then it is not suitable as a replacement. Having said that, I do not think that anything on that list given by @wafels is likely to cause a problem. |
I would add to that list
Unit manipulation
|
@DanRyanIrish Those features are practically only going to be available if we use Astropy table with quantity columns. Which, of course, is my point ;) |
Here is a summary of what has been discussed on this issue so far including in SunPy dev meetings. @ehsteve can add this comment as a new summary for this issue. This issue began as a debate as to whether or not to switch the LightCurve object from pandas dataframes to astropy tables. However, in order to make that decision, it was decided that we should define the functionality we need/want the LightCurve API to have. A list of required/desired functionality has been started: Required/Desired functionality of LightCurve API Pandas vs. Astropy Table: Open Questions Miscellaneous Fell free to add to this list or answer questions raised. And also to add anything else to this summary you feel I've missed. Once we've decided the functionality of the API and answered the open questions we have, we can decide whether or not to switch LightCurve to depend on Astropy tables. |
At the most recent SunPy dev meeting the consensus was that the Lightcurve Refactor should be done using Pandas but we should add support for AstroPy Units, but we might migrate onto an AstroPy QTable based TimeSeries class in future, when one becomes available. Part of the rational for this was that there's a limitation of the AstroPy Time class being immutable and without an insert method, therefore you can't rearrange or add rows to a QTable with Mixing column for Time without manually recreating the entire table. |
This issue began as a debate as to whether or not to switch the LightCurve object from pandas dataframes to astropy tables. Triggered by work going on in astropy/astropy#3915 to enable the use of astropy table for time series (more generally allowing a unique index column).
However, in order to make that decision, it was decided that we should define the functionality we need/want the LightCurve API to have. A list of required/desired functionality has been started:
Required/Desired functionality of LightCurve API:
LightCurve Truncation
Merging LightCurves
Subsampling
Resampling
Summing
Sorting
Unit aware
Unit manipulation
Pandas vs. Astropy Table: Open Questions
Miscellaneous
The text was updated successfully, but these errors were encountered: