# Accessing APIs in Astronomy

APIs, or Application Program Interfaces, are sets of tools and instructions to access an application. For our purposes here, an API will allow us to request data from astronomical catalogs and websites through scripts in Python.

First, we'll load up some of the packages that we'll need to use. This includes **urllib** and **ulrlib2** to access URLs; **requests**, which is a better way to do the same; **StringIO** to treat strings as if they were files; **BeautifulSoup** to parse HTML files; **simplejson** to parse JSON files; and **requests_oauthlib** to handle authorization for API requests. We'll also load **matplotlib** for a quick plot and **astropy** to deal with VOTables.

In [None]:
%matplotlib inline

import urllib2, urllib
import requests
from StringIO import StringIO
from astropy.table import Table
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import simplejson
from requests_oauthlib import OAuth1 # used for twitter's api

## IRSA API

Let's look at an example by considering the [IRSA (NASA/IPAC Infrared Science Archive) API](http://irsa.ipac.caltech.edu/applications/Gator/GatorAid/irsa/catsearch.html).

Below, we load the **urllib** and **urllib2** modules in Python which will allow us to work with URL and send requests. (We'll see a better way to do this later on; see ADS section). 
The first step is to create the URL we need to access.

In [None]:
params = {}
params['catalog'] = 'fp_psc'
params['spatial'] = 'cone'
params['objstr'] = 'M31'
params['radius'] = 300
params['radunits'] = 'arcsec'
params['outfmt'] = 3
url_values = urllib.urlencode(params)
print url_values
baseurl = 'http://irsa.ipac.caltech.edu/cgi-bin/Gator/nph-query'
url = baseurl + '?' + url_values
print url

In [None]:
response = urllib2.urlopen(url)
u = response.read()

Rather than work with the VOTable string, we can save it to a file to be used elsewhere, or parse it directly in Python. We'll use the **StringIO** module to convert this string to a file-like object, which can be parsed and read by the **astropy.table** module into a Table object.

In [None]:
temp = StringIO(u)
t = Table.read(temp, format='votable')
print t
print t.colnames

You can now do all the usual things you would want with this table of sources, such as create a plot.

In [None]:
plt.scatter(t['ra'], t['dec'], c=t['j_k'])
plt.gca().invert_xaxis()
plt.xlabel('RA (deg)')
plt.ylabel('Dec (deg)')
color = plt.colorbar()
color.set_label('J-K Color')

## Simbad API

Now, let's have a look at [Simbad](http://simbad.u-strasbg.fr/simbad/) and use their [API](http://simbad.u-strasbg.fr/simbad/sim-help?Page=sim-url). There are a variety of URL available to us depending on what we want to do, we'll be using the most generalized one to send a custom-made script.

In [None]:
baseurl = 'http://simbad.u-strasbg.fr/simbad/sim-script'
script = """output console=off script=off
votable v1 {
MAIN_ID
COO
RA(d)
DEC(d)
OTYPE
}
votable open v1
sirius
query id V4046 Sgr
query coo 11 01 -34 42 radius=30s
votable close
"""
script = 'script=' + urllib.quote_plus(script)
url = baseurl + '?' +  script
print url

Up to now, with the parameters set in the URL, we've been issuing HTTP GET requests. That still works here, but there can be problems with very long URLs. To avoid this, we'll send the parameters via an HTTP POST requests. Some websites will treat GET and POST requests differently, but for our purposes this works fine and we don't need to worry about their differences for now.

In [None]:
response = urllib2.urlopen(baseurl, script)
u = response.read()
temp = StringIO(u)
try:
    t = Table.read(temp, format='votable')
    print t
except ValueError:
    print u

For a full list of fields to request from Simbad, as well as instructions on other Simbad queries, check out their [help page](http://simbad.u-strasbg.fr/simbad/sim-help?Page=sim-fscript).

## ADS API

Next, let's have a look at the [ADS](http://www.adsabs.harvard.edu/) API. Let's start with the old, classic way of searching for papers.

In [None]:
baseurl = 'http://adsabs.harvard.edu/cgi-bin/basic_connect'
query = '^Rodriguez,D 2010-'
query = 'qsearch=' + urllib.quote_plus(query)
url = baseurl + '?' + query
print url

In [None]:
response = urllib2.urlopen(url)
u = response.read()

The result from this is just the ADS website. In order to get something out of this, we will parse it and grab the relevant information. We'll use [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/) to do this.

In [None]:
soup = BeautifulSoup(u, 'html.parser')
authors = soup.find_all('td', width='25%')
titles = soup.find_all('td', align="left", valign="top", colspan=3)
for i in range(len(authors)):
    print authors[i].get_text(), titles[i].string

It's good to know how to parse HTML and get useful information out of websites.
However, ADS has a new [interface](https://ui.adsabs.harvard.edu/) and a new way to call their [API](http://adsabs.github.io/help/api/). To access it, you'll need a personal authorization token which you can get on your user page when you log in. This token should be kept private.

In [None]:
my_token = 'K1aaAoEAoszEBSv3Y6xvFikCCjjmuuxcu9Z0KjaR'

We'll be using the **[requests](http://docs.python-requests.org/en/latest/index.html)** package as that can deal with authorization to API, such as tokens and passwords. Anything we could do with **urllib**, we can pretty much do with **requests**.

In [None]:
baseurl = 'https://api.adsabs.harvard.edu/v1/search' + '/query'
my_search = {}
my_search['q'] = ['author:^Rodriguez,D','year:[2010 TO *]']  # the query
my_search['fl'] = 'bibcode,author,title,citation_count,pubdate' # what to output
my_search['sort'] = 'citation_count desc' # how to sort
my_search['rows'] = 20 # how many entries to return
my_auth = {'Authorization': 'Bearer:'+my_token}
r = requests.get(baseurl, params=my_search, headers=my_auth)

The output from such a request is a JSON (JavaScript Object Notation) formatted string.

In [None]:
r.json()

We'll use **simplejson** to parse the JSON information into something we can use.

In [None]:
data = simplejson.loads(r.text)['response']['docs']

for i in range(len(data)):
    #if (data[i]['author'][0]!='Rodriguez, David R.'): continue
    print data[i]['bibcode'], data[i]['citation_count'], data[i]['pubdate'], data[i]['author'][0], data[i]['title'][0] 

Refer to the [ADS API Github page](https://github.com/adsabs/adsabs-dev-api) for more examples of the sorts of searches you can do and the fields you can ask for.

## Twitter API

Finally, let's have a look at Twitter. While not exactly part of an astronomer's toolset, Twitter is often used in conferences to communicate with one another and with the public. Let's create a simple example where we access the API and get results for tweets about #astroHackNY.

You need special authorization ([OAuth](http://oauth.net/)) to query the [Twitter API](https://dev.twitter.com/overview/documentation). You'll need to go to Twitter's [Application Page](https://apps.twitter.com/), login with your account, and create an application. I've done this with my account and saved them to a file in JSON format, which I read below.

In [None]:
with open("twitter_secrets.json.nogit") as f:
    secrets = simplejson.loads(f.read())

# OAuth1 is a module in requests_oauthlib
auth = OAuth1(
    secrets["api_key"],
    secrets["api_secret"],
    secrets["access_token"],
    secrets["access_token_secret"]
)

Now we can structure our request in pretty much the same way as before.

In [None]:
params = {'q':'#AstroHackNY', 'result_type':'recent', 'count':20}
url = 'https://api.twitter.com/1.1/search/tweets.json'
r = requests.get(url, auth=auth, params=params)

In [None]:
data = simplejson.loads(r.text)['statuses']

for i in range(len(data)):
    print data[i]['user']['screen_name'] + ': ' + data[i]['text'] + '  (' + data[i]['created_at'] + ')'

The Twitter API is quite complex, so I recommend experimenting and reading through their documentation.

## Conclusion

There are many other APIs that you can go access. [Here is a list](http://dotastronomy.com/events/hackdays/nyc2012/apis/) of some astronomy-related ones, but the tools you learned here carry over to more general APIs as well. Now go forth and mine some data!