Skip to content

GSOC 2014 Application Simon Liedtke:New remote services in astroquery package

Simon Liedtke edited this page Mar 18, 2014 · 8 revisions

Application for GSOC 2014

Student Information

Name: Simon Liedtke
Email: liedtke.simon@googlemail.com
IRC handle: derdon@irc.freenode.net
GitHub account: derdon
Jabber account: derdon@jabber.ccc.de
Blog: http://derdon.github.io/blog/

University Information

University: University Of Bremen
Major: Computer Science
Current Semester: 4
Expected Graduation Date: Summer 2015
Degree: Bachelor of Science (BSc)

Project Proposal Information

Title: AstroPy: New remote services in astroquery package

Abstract

Astropy is a Python library for astronomy and astrophysics. One of its affiliated packages is astroquery. Astroquery offers APIs to web services to query astronomical data. Every service has its own API because every service targets different kind of data and thus has a different set of query options. Some services can only be used to search remote databases (e.g. IRSA) while others also offer possibilities to download associated data (e.g. ESO).

Currently, the number of supported web services is 17 (see the section "List of Modules" at the astroquery page). The major plan is to add support for more services to the astroquery package. I will also fix astroquery issues to improve the stability and to extend existing web service interfaces.

Detailed Description

Milestones:

May 12 — May 18 (1 week): Community Bonding Period. Read documentation, especially the testing guide to get more familiar with testing remote services and the template module to find out how modules for remote services can be implemented. Also read the implementation of already supported remote services to get a better understanding of how to add support for new services.
May 19 — May 25 (1 week): Add support for the remote service CDS xMatch. See issue #49. The documentation contains a usage example which makes this service a good start.
May 26 — June 1 (1 week): Add SKYVIEW support. See issue #215. The issue points to a SkyView Query Form which will be used to query the respective server.
June 2 — June 8 (1 week): Add support for box queries for SIMBAD. See issue #216. There is already support for querying the SIMBAD service in different ways, but there is no support to query SIMBAD by criteria yet.
June 9 — June 15 (1 week): Add service Atomic Line List. See issue #290. There is a web form to query Atomic Line List on the website of the University of Kentucky which will be used to query this service.
June 16 — June 29 (2 weeks): Use BeautifulSoup instead of lxml in the ESO package. This finishes the work I have started in the pull request #284. lxml uses C-bindings to improve performance, but the drawback of this is that building this library manually can lead to issues (see issue #278 for example). BeautifulSoup, on the other hand, is Python-only and therefore doesn't cause any trouble installing.
June 27: Mid-term evaluation: Check stability. All tests muss pass and every part of the public API must be documented.
June 30 — July 13 (2 weeks): Add multi-object query interface. See issue #228. Services such as Vizier and SIMBAD allow searches towards multiple objects. Using this feature very often within a short time frame may be interpreted as a DOS attack to the affected web server. By bundling multiple requests into one this problem will be avoided.
July 14 — July 27 (2 weeks): Add the new services Solar System Object Search and JPL Solar System Dynamics. See issue #222. To query them, the web forms by the Canadian Astronomy Data Centre and by the California Institute of Technology will be used, respectively.
July 28 — August 10 (2 weeks): Keep server URLs up-to-date. See issue #109.
August 11: Suggested 'pencils down' date. Add more tests, improve the documentation, refactor and fix bugs. If there hasn't been time before, support for coveralls will be added here. This is also a buffer zone if features require more time than planned.

Code Sample(s)

A pull request I submitted to the astroquery repository can be found at https://github.com/astropy/astroquery/pull/286. I changed the (X)HTML parsing library from lxml to BeautifulSoup as it was requested in the issue #284_.

I was interested in esoteric programming languages and therefore wrote a an interpreter for the language chef. To simplify developing and debugging programs written in befunge, I wrote an interactive shell for it, see befungeshell.

To excercise my newly developed skills in the programming language Go, I wrote a (still experimental) library for parsing and making images in the Netpbm format, see netpbm to have a look at the code.

Writing code is not the only way to contribute to open source projects: To help the developers of a project with improving it, it is important to report bugs the users have encountered. An example of a bug I have reported is from the urwid project: there is an issue with putting an application into the background and fetching it back into the foreground. The complete bug report can be found here: https://github.com/wardi/urwid/issues/25.

The source code for my GSoC 2013 contributions can be found at https://github.com/sunpy/sunpy/tree/master/sunpy/database and the documentation for it is spliited up into the database guide and the database API documentation. The database acts as a (possibly finite) cache and it's possible to select a different caching algorithm than LRU such as LFU or to add a custom caching algorithm easily. Other notable features include an undo/redo manager and the ability to detect already used queries to save bandwidth and query the database in this case instead of a remote data server.

Biography

I am a 22 year old student studying computer science in Bremen, Germany. When I was 12 years old, I dived into the world of coding and started to interest myself for the Internet and the world wide web. I wanted to find out how web pages are made, so I learned HTML, CSS, and JavaScript. When I read documentation about JavaScript, I discovered that it works on the client side, the browser, and thus cannot be used for interacting with a database from the webserver (back in my days, there was no fancy thing such as Node.js). So I became curious again and learned PHP (it seemed to me that there was no alternative option). Five years ago, I read about the programming language Python and found out that PHP has many flaws which I could not notice when I learned it because I was a beginner back then. The time I started learning Python was also the time I started supporting the German Python community: I am an active member both in the IRC channel #python.de at freenode and in the German Python forum. When I used Python for web programming, I used Werkzeug and Flask as web frameworks and Genshi as the template engine.

Last year I contributed as a GSoC student for SunPy. During that project, I added the database package which makes it possible to save queried and retrieved data in a local or remote database. More information can be found in the "Code Sample(s)" section.

I have thorough experience with git, sphinx, the testing framework py.test and working on open source projects in general. Because of my contributions to SunPy, I also have advanced experience with developing database applications using SQLAlchemy. Additionally, I have basic experience with requests and think I can extend my skills with this library easily.

Other Schedule Information

I have classes until 16 May and exam period goes from 19 May to 27 June. I can't tell you yet the exact dates of my exams, probably I will have finished them before 27 June. Maybe it is interesting for you to know that I'm currently studying as an ERASMUS exchange student at BME — Budapest University of Technology and Economics. After the exam period I will have more time and be able to work 40h/week or even more if I need to compensate for the previous weeks.

Clone this wiki locally