Skip to content

GSoC 2015 Application Moataz Hisham: Implement Distribution Support for Quantity

Moataz Hisham edited this page Mar 27, 2015 · 2 revisions

Background


 

I am a third-year Computer Engineering student, mainly interested in software development and engineering. Regarding my experience, I have been coding in Java for three years,took a C/C++ class and worked with algorithms, database, web, and mobile implementation using mainly Java. I use Eclipse for Java and C++ and Android Studio for android. I use Windows machine but I worked on Linux before for approximately one year so switching OS if needed won't be a problem. As I am new to Python I found myself very comfortable learning and writing it because of its C-style syntax and it became more powerful when I used OOP with it, I use Anaconda and Spyder IDE and tried pycharm, worked with ipython notebooks and experimented with packages, including Astropy and SciPy. I took three physics classes and a Statistics and Probability class, which served as an introductory point to understand the nature of this project. In addition, I am doing further self-study courses on related subjects to this project before the program starts to make sure my progress will meet the timeline suggested. I am generally interested in astronomy and cosmology and very interested in computational science implementation. Other hobbies include reading, sci-fi movies and documentaries, video games and open source hardware (Arduino and recently Raspberry Pi).

Project Details 

  • Proposal Abstract:
  • The Quantity class is powerful but doesn't have particularly useful support for uncertainties on quantities or other statistical approaches to thinking about numbers. A very straightforward way to make progress on this would be to create a subclass of Quantity called “Distribution” (or similar) that represents a probability density function of a quantity as Monte-Carlo-sampled arrays. This project would involve implementing this subclass, propagating operations while combining distributions, as well as tools for extracting useful information from such distributions. If there is time, this could also involve expanding this system to support common analytically-representable distributions such as Gaussian and Poisson distributions.

    ---
    • Proposal Detailed Description:

    This project aims to provide a new class in Astropy capable of performing some statistical functionality on quantities. The Distribution class will improve the accuracy and clarity when interrupting data by making meaningful predictions and analyses on the given data.

    I will start this project by discussing with my mentors the best approach to develop the Distribution class. I will use the section in the Quantity docs for subclassing as a guide, Distribution will benefit from Numpy characteristic, which Quantity uses, as an example implementing the mean and the variance (standard deviation) for the distribution to provide a Quantity like operations for Distribution to make it as easy as possible for the user to use, scipy.stats and numpy.random may work as a reference during the process.

    After that, add some methods to actually implement a statistical logic, it will involve getting the Gaussian/Normal distribution for continuous variables and Poisson distribution for discrete variables, combine two distributions based on the mean (and variance if applicable), calculate the PDF and CDF for a distribution.  

    Next, I’ll write an API for the Distribution class quite familiar to the Astropy logic as stated before, it will look something like this to get a Gaussian /Normal  distribution:

    #To define a quantity

    from astropy import units as u

    q =  [1.,2.,3.] *u.m  

    #To define a distribution

    from astropy import distribution as dist

    var = 0.5 * u.m   

    #norm is a normal continuous random variable

    d= dist.norm(q, var=var, sample=size)

    q works as the mean, which is the center of the distribution. var is the variance; the standard deviation of the distribution, sample is the output shape of the distribution. later on,the user will be able to query any distribution easily by invoking some methods that will look like this; some_distribution.mean() returns mean, some_distribution.get_size() returns the size and norm.pdf(d) to get the probability density function.

    Then, if time is available,  some features will be added to make even more accurate data interruptions, examples (if they have not already been added during the development process): normality check, error analysis and dealing with uncertainties.

     

    ---
    • Timeline:

    April 27 to May 24 (Community Bonding Period)

    Discuss with the mentors how the final class should look like, read the documentation for quantity and Astropy development best practice, become familiar with statistical implementations in Python look on other classes with a similar concept and come with a general idea on how the new class will serve in Astropy.

    May 25 to June 1 (weeks 1 and 2)

    Begin working on the class defining the essential methods and determine what other methods should be override that will make sense to use with the distribution.

    June 8 (week 3)

    Test the class, find if there are any  bugs, unnecessary or  necessary code make sure it follows the Astropy best practice.

    June 15, June 22 (weeks 4 and 5)

    Writing the specific statistical methods that will help make the best use of the class, like generating Gaussian/normal and Poisson distribution then calculate the PDF and CDf.

    June 29 (week 6)

    Test the methods functionality, add or remove extra methods, make sure everything is intact and working like it should.

    July 6, July 13 (weeks 7 and 8)

    Write an appropriate API to make the most of the class in a simple and user-friendly way, define how the class components should interact with each other and with other classes.

    June 20 (week 9)

    Test the API in various situations, try to make sure everything falls in place and performing as it is meant to.

    June 27 (week 10)

    Debug the project as a whole, making sure it performs as optimal as possible

    August 3, August 10 (weeks 11 and 12)

    Serve as a buffer in case something goes wrong or any delay happens. Add other features that will improve or make a valuable addition to the nature of the class like: standard error, normality check and uncertainty distribution, etc.

    12-21 August

    Add final pieces to the documentation to make it easy for future changes and write a how-to guide to help the users understand how to work with the Distribution class.

    Clone this wiki locally