Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aggregate_downsample() API tweak to improve performance #13078

Open
orionlee opened this issue Apr 7, 2022 · 0 comments
Open

aggregate_downsample() API tweak to improve performance #13078

orionlee opened this issue Apr 7, 2022 · 0 comments

Comments

@orionlee
Copy link
Contributor

orionlee commented Apr 7, 2022

Description

The following is a few ideas on tweaking aggregate_downsample() API to improve performance in practice.
They all fall into the bucket that do not involve changes on the internals of the implementation, but API changes that would enable users to get some performance gain.

1. Change the default of aggregate_func parameter to use faster median when available.

2. Let users optionally specify a subset of columns to bin with a new optional columns parameter

A TimeSeries from actual observation often have many columns. In practice, users may not care about downsampling all of them.

E.g., a TESS SPOC lightcurve fits file has about 20+ columns, if users only want to bin the flux, they could then call aggregate_down_sample(ts, columns=['flux', 'flux_err'], time_bin_size=10*u.minute) , which could easily cut down the running time by a factor of 10.

3. Let users specify different aggregate_func for err columns.

E.g., a TESS SPOC lightcurve fits file as an example again, to properly bin flux, currently one has to:
i. call aggregate_downsample() once to get the binned flux
ii. call aggregate_downsample() again with root mean square as aggregate_func to get binned flux_err

The 2 calls can be reduced to 1, if aggregate_downsample() let users optionally specify the aggregate_func on a per-column basis.

In terms of API changes, one option is to add new optional aggregate_func_selector parameter to specify aggregate_func on a per-column basis.

With such change, bining a TESS Lightcurve can then be done in 1 call, handling both regular columns and error columns.

def tess_aggregate_func_selector(colname):
    if colname.endswith('_err'):
        return root_mean_square_root_func
    else:
        return np.nanmedian
ts_b = aggregate_downsample(ts, time_bin_size=10*u.minute, aggregate_func_selector=tess_aggregate_func_selector) 

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants