What methods do we want in astropy.stats?
Currently astropy.stats (docs, code) contains only one function: sigmaclip.
- What other common astronomy statistical methods do we want (that are not already available in numpy, scipy, or maybe scikit-learn, scikit-image, statsmodels)?
- Are all methods general enough or is there a need for an affiliated package (e.g. for X-ray and gamma-ray Poisson counting statistics methods)?
- Who wants to help implement / test / benchmark / document astropy.stats?
Feel free to add to the whishlist below even if you don't have the time / expertise to work on this yourself.
API and implementation details will be the next step we discuss, for now please just list the methods you want with a links to webpages / papers / existing code that explain the basics.
In general, due to limited resources, we do not plan to include methods readily available in the scientific Python stack: Numpy and Scipy (docs), specifically scipy.stats as well as scikit-learn, scikit-image, statsmodels, Pandas and sympy.
Of course there's many more useful stats-related Python packages, e.g. PyMC, pyBLoCXS, uncertainties for Gaussian error propagation ...
THE Physics package for data analysis in general is ROOT and for modelling / fitting RooFit and, what is relevant here, for statistics RooStats (reference,
wiki, a tutorial. It does come with a Python interface PyROOT, but ROOT and especially the upcoming ROOT 6 are way too heavy dependencies to be considered for use in astropy.stats
), plus most of it is GPL-licensed, so can't be used in the BSD-licensed astropy
. We probably do want some of the methods in RooStats re-coded in astropy.stats
, please do look through the list of classes in the RooStats reference and mention what you need for your astro analysis.
We will add something to astropy.stats
if it is not available in numpy, scipy or matplotlib but needed to implement something in the astropy core (see Dependencies at http://www.astropy.org/vision.html).
We will consider adding (copying or re-implementing) methods that are available in other scientific Python packages if the existing package is a very heavy dependency (like ROOT) or the existing class / function is very different from the way astronomers use it, so that astronomers will have trouble understanding the parameter names and docstring.
- Poisson count statistics (see github issue #611)
- Functions to compute excess, excess error, significance, sensitiviy, likelihood, ... for the case of known background (one counts measurement in a source region) and unknown background (two counts measurements in a source and background region)
- Likelihood and chi2 statistics: cstats, cash, various chi2 approximations (CIAO Sherpa stats)
- Bayesian methods such as this.
- Significance & Confidence intervals: Li & Ma 1983, Rolke 2005, Feldman-Cousins
- Confidence intervals for small number statistics (Gehrels 1986)
- Robust Statistics (see github pull request #621)
- MAD
- bi-weight (Beers, Flynn, & Gehardt 1990)
- Robust line fit (covered by model fitting?)
- resistent mean
- Principal Component Analysis
- e.g., http://packages.python.org/Astropysics/coremods/utils.html#astropysics.utils.stats.Pca
- Also available in scikit-learn
- Statistical Tests
- KS test (scipy.stats)
- pearson (scipy.stats)
- spearman (scipy.stats)
- chisq (scipy.stats)
- Anderson-Darling (scipy.stats, but only implemented for one sample and four distributions)
- What other tests are commonly used by astronomers? Are they in [scipy.stats] (http://docs.scipy.org/doc/scipy/reference/stats.html) already
- Sensitivity/SNR calculations relevant for CCDs that could be useful both in imaging and spectroscopy. [TODO: needs link to description or code]
- Completeness calculations pertaining to redshift surveys [TODO: needs link to description or code]
- Inverse distance weighting interpolation for scattered data
- http://en.wikipedia.org/wiki/Inverse_distance_weighting
- Also requested to be included in scipy: http://projects.scipy.org/scipy/ticket/1497
Here we collect features that really belong in numpy / scipy. Even if someone implements them it will take a year or two until it is released and the updated by the majority of users, so we might decide to implement these features in astropy for now.
- numpy issue #1811: "median in average O(n) time" from 2009-09-01. There is a fast median in bottleneck, see speed comparison of numpy/scipy/bottleneck/pandas here. I don't know if there is a plan to merge parts of bottleneck into numpy.
- numpy issue #2448, "Numerical-stable sum (similar to math.fsum)" from 2011-06-02
- 2013-01-08, astropy-dev mailing list: "What methods do we want in astropy.stats?"
- 2013-01-15, astropy mailing list: "Missing some statistical method for your astronomy analyses in Python?"
- 2013-01-07, github issue #611: Add On / off Poisson stats methods by @cdeil
- 2013-01-12, github pull request #621: Robust statistics to stats package by @crawfordsm
-
github pull request #439 Models subpackage by @ndev, specifically comment by @cdeil on whether fitting stats and optimizers and confidence interval calculators belongs in
astropy.stats
orastropy.models
or somewhere else. - [astLib.aststats] (http://astlib.sourceforge.net/docs/astLib/astLib.astStats-module.html) has a bunch of routines.
- Simple statistics with scipy
- [Statistical Computing with Python] (http://www.astro.cornell.edu/staff/loredo/statpy/)
- Search on Amazon for "astronomy statistics" gives 288 results as of Jan 2013, here are some that seem like good references to me from the title / table of content:
- Practical Statistics for Astronomers (Cambridge Observing Handbooks for Research Astronomers) by Wall & Jenkins, 2nd edition, 2012
- Modern Statistical Methods for Astronomy: With R Applications by Eric D. Feigelson, 2012
- Astronomical Image and Data Analysis (Astronomy and Astrophysics Library) by J.-L. Starck, 2006. (some statistics, but also image processing, ...)
- Statistics, Data Mining, and Machine Learning in Astronomy by Zeljko Ivezic, Andrew Connolly, Jacob VanderPlas, and Alex Gray, to be published in early 2013. (not published yet, but comes with BSD-licensed Python code)