Skip to content

Elijah Bernstein Cooper: Adding Distribution Capability to Quantity

Elijah Bernstein-Cooper edited this page Mar 27, 2015 · 17 revisions

GSoC 2015 Application


Sub-organization

Astropy

Student Information

University Information

  • University: University of Wisconsin Madison
  • Degree: Pending PhD in Astronomy
  • Expected Graduation Date: 2017

Background

I'm a second-year PhD student in astronomy at the University of Wisconsin Madison using large 3D and 2D observations to determine how molecular hydrogen forms.

I have written more than 10,000 lines of code in Python to accomplish my research. This includes using libraries such as Astropy, Numpy and Scipy for array and matrix manipulation, minimization, and statistical analysis of a variety of observational data. One example project I've developed is to extract the Planck satellite archival data in HEALPix projection to smaller regions in Cartesian projection. I've written many Python modules and analysis scripts for personal use as well.

My formal education in statistics and coding has prepared me well for the proposed project. I'm trained in Python, Matlab, java and R. Below is a list of the classes I have completed in the past:

Statistics C.S.
Estimations of Functions from Data Object Oriented Programming
Theory & Application of Pattern Recognition Theory & Application of Pattern Recognition
Applied Categorical Data Analysis Introduction to Scientific Programming
Introduction to Statistical Modeling

My previous experience and devotion to open-source software make me an excellent candidate for the Google Summer of Code 2015 program.

Project Proposal Information

Proposal Title

Bringing Accessible Statistics to Astronomy

Proposal Abstract

Astronomy heavily rely on uncertainties of measured or model variables in large datasets. My proposed project this summer will be to integrate uncertainty analysis into the astropy Python module. I will integrate the capability for astropy users to track probability density functions in the astropy.units.Quantity class. My proposed addition will provide a harmonious framework for users to track uncertainties throughout their analysis.

Proposal Detailed Description & Timeline

The rapid growth of data available to astronomers demands the application of rigorous statistical analysis. Despite the excess of data, astronomers will always push observations to the noise limit. Astronomy heavily rely on uncertainties of measured or model parameters. One robust method for calculating uncertainties is Monte Carlo sampling, which simulates an observation many times. Many astronomers have written their own Monte Carlo methods. My proposed project this summer will be to integrate uncertainty analysis into the astropy Python module, providing an approachable method for astronomers to perform rigorous statistics on large amounts of data.

astropy includes a seamless approach to handling quantities with units. The class astropy.units.Quantity allows users to represent values with associated units. Users can perform most numpy functions on Quantity instances, which track units. This class allows for easy manipulating of values.

We will integrate a subclass of Quantity called Distribution that represents a probability density function of a quantity as Monte-Carlo-sampled arrays. One of the axes of a Quantity instance will be samples from a distribution. Distribution will have convenience methods to let a user treat it like a quantity by some statistical function, e.g., a median.

Next we will develop methods to propagate operations while combining distributions. Often astronomers are interested in the uncertainties in a variable, which is a function of more than one parameter each with their own probability density function. Propagating distributions through operations on Quantity instances will allow users to estimate uncertainties of variables dependent on more than one parameter.

Finally we will incorporate tools for extracting useful information from Distribution instances. These will likely be convenience functions. Examples include confidence interval estimates, medians, or a random sample from the distribution. If there is time, this could also involve expanding this system to support common analytically-representable distributions such as Gaussian and Poisson distributions.

Timeline

April 27 - May 24 (Community bonding period, 4 weeks) Develop design of API for the Distribution class. Outline methods associated with Quantity instances, which handle the Distribution axis.

May 25 - June 14 (3 weeks) Begin implementing the API for Distribution. Add additional axis to Quantity. Start a test suite for incremental changes to Quantity.

June 22 - July 5 (2 weeks) Complete operating version of Distribution API. Add convenience methods for users similar to Quantity methods so that users can track distributions easily. Document Distribution class.

July 6 - July 20 (2 weeks) Solicit input from astropy users. Rework API based on feedback. Edit documentation.

July 20 - August 9 (2 weeks) Develop operation methods for distributions. Add some functionality to create common distributions such as Gaussian/Normal or Poisson distributions, given a particular quantity.

August 10 - August 17 (1 week) Polish documentation for use of Distribution. Provide many simple examples using each convenience method.

Other Schedule Information

I will be attending a workshop from June 1 - June 6, and plan to take holiday for one week at the end of June.

Pull Requests for Astropy

Expanding astropy.stats.bootstrap generality

Clone this wiki locally