Skip to content

What methods do we want in astropy.stats?

astronomeralex edited this page Feb 28, 2013 · 35 revisions

Introduction

Currently astropy.stats (docs, code) contains only one function: sigmaclip.

  • What other common astronomy statistical methods do we want (that are not already available in numpy, scipy, or maybe scikit-learn, scikit-image, statsmodels)?
  • Are all methods general enough or is there a need for an affiliated package (e.g. for X-ray and gamma-ray Poisson counting statistics methods)?
  • Who wants to help implement / test / benchmark / document astropy.stats?

Feel free to add to the whishlist below even if you don't have the time / expertise to work on this yourself.

API and implementation details will be the next step we discuss, for now please just list the methods you want with a links to webpages / papers / existing code that explain the basics.

Other packages; what to duplicate in astropy.stats

In general, due to limited resources, we do not plan to include methods readily available in the scientific Python stack: Numpy and Scipy (docs), specifically scipy.stats as well as scikit-learn, scikit-image, statsmodels, Pandas and sympy.

Of course there's many more useful stats-related Python packages, e.g. PyMC, pyBLoCXS, uncertainties for Gaussian error propagation ...

THE Physics package for data analysis in general is ROOT and for modelling / fitting RooFit and, what is relevant here, for statistics RooStats (reference, wiki, a tutorial. It does come with a Python interface PyROOT, but ROOT and especially the upcoming ROOT 6 are way too heavy dependencies to be considered for use in astropy.stats), plus most of it is GPL-licensed, so can't be used in the BSD-licensed astropy. We probably do want some of the methods in RooStats re-coded in astropy.stats, please do look through the list of classes in the RooStats reference and mention what you need for your astro analysis.

We will add something to astropy.stats if it is not available in numpy, scipy or matplotlib but needed to implement something in the astropy core (see Dependencies at http://www.astropy.org/vision.html).

We will consider adding (copying or re-implementing) methods that are available in other scientific Python packages if the existing package is a very heavy dependency (like ROOT) or the existing class / function is very different from the way astronomers use it, so that astronomers will have trouble understanding the parameter names and docstring.

Wishlist

Whishlist for numpy / scipy

Here we collect features that really belong in numpy / scipy. Even if someone implements them it will take a year or two until it is released and the updated by the majority of users, so we might decide to implement these features in astropy for now.

  • numpy issue #1811: "median in average O(n) time" from 2009-09-01. There is a fast median in bottleneck, see speed comparison of numpy/scipy/bottleneck/pandas here. I don't know if there is a plan to merge parts of bottleneck into numpy.
  • numpy issue #2448, "Numerical-stable sum (similar to math.fsum)" from 2011-06-02

Links

Resources

Clone this wiki locally