CfAMeeting2011Minutes

jiffyclub edited this page Oct 18, 2011 · 14 revisions
Clone this wiki locally

Minutes for AstroPy Meeting 2011

The following minutes were taken by [[Thomas Robitaille | https://github.com/astrofrog]]. Please report any issues/mistakes to thomas.robitaille@gmail.com.

Wednesday 12th October

The following people were present locally:

  • Tom Aldcroft
  • Thomas Robitaille
  • Wolfgang Kerzendorf
  • Michael Droettboom
  • Nadia Dencheva
  • Mathieu Servillat
  • Erik Tollerud
  • Francesco Pierfederici
  • Perry Greenfield
  • Matt Davis
  • Brian Refsdal
  • Doug Burke
  • Steve Crawford
  • Erik Bray

The following people were present remotely:

  • James Turner
  • Kathleen Labrie
  • Adam Ginsburg
  • Eric Barron

Development workflow

The meeting started at 11am. The first discussion centered on the development workflow documented on the astropy website, and people were generally happy with the current workflow (but see later for some changes).

The discussion moved to whether the individual components of astropy should be installable separately (raised by Francesco). The reason this point was raised is that in some cases one might not want to have to install scipy which is currently listed as a core requirement for astropy. However, it was decided that it would be too complex to allow components of astropy to be installable separately, but it seemed the main reason this was a concern was because of the scipy dependency. Therefore, the suggestion was put forward that maybe astropy should be installable and importable without scipy and without matplotlib, which would mean that only certain components of astropy get imported (those depending only on numpy). This was agreed on to be better than astropy not working at all. The suggestion was put forward to have all calls to scipy and matplotlib be:

import astropy.scipy
import astropy.matplotlib

and have astropy.scipy attempt to import scipy. If this doesn't work, then certain common functions from scipy could be included in astropy.scipy instead so that some code depending on only those scipy functions would still work. This suggestion was approved.

It was emphasized that this does not mean that use of scipy should be discouraged, but scipy should only be used if the functionality is not available in numpy.

The idea that all changes should be done through pull requests was discussed. Mike pointed out that in the case of matplotlib, pull request often languish for reviews and that they decided some timeout mechanism was needed to force pulls on those if no one bothers to review them (it ends up going to the users to see if it causes problems). Otherwise there tends to be a backlog of pull requests. No one objected to this.

There were further discussions in the day that do impact on the workflow (see discussion on affiliated packages).

Licenses

We discussed what license we should use for astropy, and we talked about the usual issues of e.g. GPL virality, and acknowledged that using a non-GPL license means that we can't use any GPL code. Nevertheless, we decided that since we don't want to impose the GPL virality ourselves to packages using astropy, we would use a less restrictive license. The motion to vote down the GPL license was voted unanimously by all present.

The contenders being considered were the Apache License and the BSD and MIT licenses. No one in the room was immediately familiar enough with these to know for sure which was better, but the Apache license is more detailed and leaves less room for interpretation, so might be safer to go with. A StackOverflow discussion is available on the three licenses (here). A vote was deferred to Thursday.

Lunch discussion

A spontaneous discussion over lunch lead to the conclusion that arbitrary metadata should be kept in dictionary-like objects, but that some classes (e.g. Spectrum) could have hard-coded attributes that may map to existing meta-data. What was perceived as dangerous is having attributes be 'dynamic' names, for example accessing any meta-data object through attribute notation, because in that case the attributes would need to be valid python names. So things that are standard in an object (Spetrum.wave or Spectrum.flux) can have attribute notation, while arbitrary data could be available with item notation (Table['B-V']), and arbitrary meta-data would be in some kind of metadata object (e.g. s.metadata['OBSERVER']).

Affiliated packages/Areas of development

We discussed what initial components of astropy we should focus on. There were detailed discussions for each component.

For example, for constants, the decision of what units to use was discussed, and it was decided that since it would not be easy to attach units to the variables (except via the docs which the user may or may not read) the unit system would be made explicit by using submodules for each system in astropy.constants (e.g. astropy.constants.cgs). One of these would be the base one, e.g. cgs, and the others would modify the base value (e.g. the S.I. msun would be set to 0.01 times the c.g.s value).

For tables, the idea of a base Table class containing a structure array and meta-data was seen to be a good idea, and a registration system for read/write functions was also approved.

For WCS, Perry pointed out we may want to have non-FITS WCS in future, so the current WCS implementation in pywcs should only be a part of astropy.wcs

Finally, a detailed discussion on Spectra, Image, and similar classes took place. The issues was that having something like CCDImage was thought to be maybe too specific initially, and it was not clear how to deal with n-dimensional datasets that aren't technically images. The decision was to have a base data class (with no good ideas and many bad ideas for names, e.g. NDIimage, NDThingy, etc...) which would contain an arbitrary n-dimensional Numpy array, a generalized WCS object, meta-data, and units. Then we would have a Spectrum and Image class which would inherit from these and would both define useful subsets. For example, Spectrum would be one-dimensional datasets with wavelength/frequency/energy on the one axis, and Image would need at least two spatial dimensions. Of course, there are other types of data (radio cubes, event lists, integral field spectrographs), which can have sub-classes defined in future too.

In addition to deciding on areas of development, we designated one or more participants who after the meeting will try and coordinate development with all other people who have registered interest for those areas on the developer list. These are not meant to be long-term coordinators, but the aim is that leaving the workshop, we have at least on person in charge of coordinating things afterwards, or nominating another coordinator if necessary, so that we know things will get moving.

The final list or areas of temporary coordinators is the following:

  • astropy.constants (with cgs and si sub-modules) [Erik T., Thomas R.]
  • astropy.units - unit conversion routines [Perry G.]
  • astropy.io.fits - FITS file I/O [Erik B.]
  • astropy.io.vo - VO tables and communication (both technically I/O) [Mike D.]
  • astropy.table - a base class for tables (and registry facility for read/write functions) [Tom A. and Thomas R.]
  • astropy.wcs - WCS transformations [Mike D. and Nadia D.]
  • astropy.coords - coordinate transformations (ultimately including Alt/Az type conversions that depend on time) [Erik T.]
  • astropy.time - handling of astronomical times (e.g. julian dates, etc.) [Wolfgang K. and Erik T.]
  • astropy.nddata - arbitrary n-dimensional datasets with WCS [Steve C.]
  • astropy.image - image-specific routines [Thomas R.]
  • astropy.spectrum - spectrum-specific routines [Wolfgang K., Erik T., Adam G.]

Mailing list

Following this discussion, it was decided that discussions relating to development should be carried out on astropy-dev (voted unanimously). A suggestion was made to use google groups instead of scipy.org (because of the better search capabilities) - 2/3 of participants voted for this, 1/3 did not care, and no one voted against, so that motion was also passed. The list has been created here.

Google Summer of Code

We briefly discussed the possibility of hiring a student or students via the Google Summer of code. The consensus was that this might be easier if the student is known in advance, and is familiar with astronomy. Either way, the person hiring the student would probably need to invest a bit of time helping the student, so this is more an issue of whether someone is prepared to spend that time.

Long-term transition

We discussed how people plan to maintain existing packages while merging part of their packages into astropy, and it seemed that a lot of people agreed that having to maintain two repositories is bad and should be kept to a minimum. For PyFITS, development could initially continue in the existing svn repository in the near future, with changes pushed from time to time to astropy, but ultimately, pyfits would be discontinued, and one could still consider creating a compatibility pyfits package which simply imports from astropy.io.fits.

The transition to astropy was also seen as a good opportunity to break APIs if necessary, and to remove deprecated code, with the possibility for teams to provide 'compatibility packages', e.g. for pyfits, which would be a very small package whose only purpose is to map the old pyfits API to astropy.io.fits so that old scripts still work without having to replace 'import pyfits' by 'import astropy.io.fits' and changing APIs (if any changes exist).

Milestones

We discussed milestones and a potential timeline for the project, and the consensus was that no exact deadlines should be set, but that milestones in terms of what is achieved by each release might be useful. For example, it would be nice to be able to have a pre-release to the mailing list with at least pyfits and pywcs already integrated into astropy. Several people raised the point that we don't want to go too public too soon - we need to slowly increase the user base, and only go fully public once many of the initial areas of development have been included in the core and extensively tested.

Workflow (continued)

The discussion returned to an earlier point about the development workflow. It seems that the distinction between affiliated packages that are meant to be included and ones which are not necessarily was causing some confusion, so it was decided that the term 'affiliated package' should maybe be used more to indicate packages that are not immediately being merged into the core (but may be some day), and those would use the template package provided. These would be installable as a group using an installation script.

Things that are going to shortly be merged into the core can just be added to a fork of the core package, and submitted via pull requests. This includes e.g. pyfits, pywcs, which are already established standalone packages, and which don't really need to be converted to a different package format before being considered to be merged. Therefore, one modification compared to before is that people working on code to be shortly included in astropy can work directly on a fork of astropy without creating a whole new affiliated package.

Documentation

We discussed briefly things to do with the organization of the documentation, and we agreed with the idea of providing a scipy-like reference (or API doc), which can easily be done with autosummary in sphinx. This could live in e.g. docs/reference or docs/api.

Thursday 13th October

The following people were present locally:

  • Erik Bray
  • Brian Refsdal
  • Doug Burke
  • Matt Davis
  • Perry Greenfield
  • Erik Tollerud
  • Nadia Dencheva
  • Adrian Price-Whelan
  • Mike Droettboom
  • Thomas Robitaille
  • Tom Aldcroft
  • Mathieu Servillat
  • Steve Crawford

The following people were present remotely:

  • James Turner
  • Kathleen Labrie
  • Adam Ginsburg
  • Eric Barron
  • Paul Barrett

Testing framework

The discussion on the mailing list regarding testing frameworks (unittest vs. nose vs. py.test) was never resolved, and so we discussed this further to reach an agreement. While unittest is no longer a contender, we discussed the pros/cons of nose and py.test:

  • py.test has a compatibility plugin for nose, but only for basic functionality, so we should probably not rely on that too much.
  • py.test can create it's own standalone testing script that doesn't require py.test, and this was perceived as being very useful, as it would allow all users to easily run tests without installing an extra dependency.
  • py.test installation is easy, much like nose (py.test used to be harder to install)
  • nose compares numpy arrays nicely, though some people mentioned that we can make py.test behave that way too.
  • the py.test documentation for writing plugins is very good, and clearer than nose.
  • py.test gives more information than nose when tests fail.

Ultimately, because it was felt there was a small advantage to py.test, we settled with py.test, (except if exploring it more later in the day or on Friday, we discovered some major issues).

Mike then raised the question of whether we should install the tests into the source tree, or whether they should be kept separate in a tests/ directory outside the astropy/ tree. In the first case, it would then be possible to run astropy.test() which would be invaluable for installations of astropy where the source directory is not preserved. We agreed that installing the tests in the source tree is a good plan as it keeps tests close to the code they are testing.

On a side note, the default should probably not be to run all tests, but have an option to pass to astropy.test() to specify which subset of tests to run.

Test data

The discussion shifted to how we should include data needed for tests, since GitHub has size limits on repositories (300Mb). The following scheme was proposed:

  • Small data files would be included alongside the tests
  • Larger data files would be stored on a server, and would be retrieved by the tests.

Several issues needed addressing with the second type of file, the biggest of which was the fact that test data depend on the specific revision of the astropy repository. Therefore, a scheme was proposed whereby tests would retrieve files from a server via hashes, which would ensure that the correct version of the file would be returned. These files could be retrieved using the following type of URL:

http://www.astropy.org/test-data/94935ac31d585f68041c08f87d1a19d4

Later in the day, we realized that since it would be easier to map sub-domains to different machines/clusters than the GitHub web repository, one could use instead:

http://data.astropy.org/94935ac31d585f68041c08f87d1a19d4

The exact infrastructure to support this was left open-ended. Several options were discussed, including:

But this is not urgent, since while the URLs should be permanent, the infrastructure to support it can be changed at any time.

During this discussion, Mike noted that in some specific case, large data files can be generated by the testing scripts (but this only covers some test cases).

Testing guidelines

It was agreed that it is unrealistic to expect 100% test coverage of public functions/methods with unit tests, and that even 100% coverage did not guarantee all functionality was being tested, so a strict coverage requirement does not really make sense, but that projects with no testing at all would clearly need to implement some tests before being considered for merging.

The use of regression tests was mentioned, and it was agreed that when bugs are fixed in astropy, at least one regression test should be included to ensure that the bug is not re-introduced in future, and the URL of the ticket should be included in the regression test for the record.

Some tests may compare results to results from external libraries/programs.

Core package layout

It was agreed that cextern/ is for C code only, and that initially at least, no Fortran extensions will be allowed in astropy (since installation can be tricky).

Mike suggested we remove plot_directive from our version of numpydoc and use the matplotlib one directly instead if needed.

Kathleen asked whether we should include the Sphinx _build/html directory in the git repository, but the general consensus was that this should not be the case, because this build is already in the astropy/astropy.github.com repository so we should not duplicate storing the documentation build.

The discussion then centered around the question of what version of the documentation should be provided for users. The final decision was that the default documentation should be the latest stable version, while both the latest developer version and stable release versions (including old ones) should be available, but not the default. How this is achieved, and whether we would use automated build scripts, was left to be decided at a later time.

Information such as a reference list for functions/methods should be kept alongside the rest of the documentation, not in a separate api/ or reference/ directory.

Tom A. raised the issue that matplotlib functions are not really searchable from Google - the first few results often just link to the front page of the website.

Mike asked whether we might want to consider a more modular layout for astropy, so that individual components can easily extracted from the rest of the package. However, it seemed the general opinion was to just go with the current style of layout, but that by putting the tests inside each sub-module, we could help make things more modular. In the specific case of pywcs, it was agreed that since only certain source files from the wcslib library are included with pywcs, this could be included in astropy/wcs rather than cextern.

Erik B. raised the point that astropy should be importable without building (useful if e.g. setup.py needs to use helper functions from the astropy tree). After a little discussion, this was agreed on.

Mike raised the issue that wrapping C code with Cython is a pain, but works fine with ctypes. Since ctypes is in the python standard library, this doesn't require any additional packages. While the use of Cython should be encouraged for people who want to speed up Python code, we should allow the use of ctypes from experienced developers.

We discussed whether generated Cython code should be included in the git repository, and the decision was that it should not be included in the master branch, but could potentially be included in 'release' branches. It was decided to add *.c to the .gitignore file so that generated C code is not accidentally committed.

Regarding testing, we agreed that astropy/modulename/tests/ or astropy/modulename/test.py can contain specific unit and regression tests for modulename, and that astropy/tests/ would contain interoperability tests using multiple modules.

Regarding coding styles, it was agreed that one could relax slightly the line length requirement in PEP8 in some cases where breaking the line would make the code less readable.

License

The discussion from Wednesday regarding discussions was continued, with two main options emerging - either the same license as scipy (modified BSD), or the Apache license, where basically adds clauses relating to patenting (which a number of participants felt was more appropriate for large companies such as Google). A number of people felt that since the scipy community, and enthought (a software company), seem to be happy with a modified BSD license for scipy, the modified BSD license would be fine.

A vote was carried out, with all but one in favor of modified BSD, and one in favor of the Apache license, so the modified BSD license was adopted.

Afternoon splinter sessions

In the afternoon, participants split up into multiple groups to discuss various topics, and start implementing decisions from the morning. The following sections include results/decisions from each group.

Data/Spectral class

One group of participants (Wolfgang K., Erik T., Steve C., Perry G., Nadia D., and others) focused on the low-level API for generalized data objects, as well as for spectra and images. The NDThingy class was renamed to NDData following a vote (!). A Spectral class would inherit from NDData, to represent any data with a dispersion dimension, then Spectrum1D and Spectrum2D, which are common types of spectra, would inherit from Spectral.

The attributes for NDData would be:

  • .data - a numpy ND-array
  • .wcs - callable object
  • .meta - dict-like (subclass of mapping)
  • .units - of the values in the matrix, e.g. counts, axis description in WCS
  • .error - agnostic for now -> maybe error module.
  • .mask - agnostic for now, linked slightly to masked array

Spectral would be a subclass of NDData with one and only one spectral dimension.

The additional attributes for Spectrum1D would be:

  • .disp (instead of .wave?)
  • .flux
  • .error
  • .disp_unit - alias for wcs.unit?
  • .flux_unit - alias for units (TBD)

Testing guidelines

Thomas R. and Matt D. worked on writing the testing guidelines, which give examples for py.test, and some preliminary ideas on how to access remote data were included. The guidelines will appear on the http://www.astropy.org website in the coming week once these details are ironed out.

OrderedDict

Tom A. has worked on back-porting the built-in ordered dictionary class from Python 2.7 to Python 2.6 (the minimum Python version we require) and gave an update on this, including backporting the tests for this.

Next Meeting

A brief discussion was carried out regarding the location/timing of the next meeting. Everyone agreed a meeting in the spring will be good, although no decision was reached regarding location. Thomas R. pointed out that at the time of the CfA vs STScI poll, we mentioned that the winner of the poll would hold a meeting in the fall (which was this meeting) and the other could hold a meeting in the spring, so it might make sense to have a spring meeting at STScI, and consider other locations (e.g. on the West Coast) for a subsequent meeting. No decision was reached.

Logo/Website competition

A logo/website design competition will be announced on the mailing list, with a prize TBD.

Friday 14th October

Significant progress was made on coding and API design on Friday. Progress can be tracked through the pull request page at https://github.com/astropy/astropy/pulls (click on 'Closed' to see all the pull requests that have already been merged in).