# Programmatic Data Access

In tutorial `3.2.2 - How to query an API`, we explored how to access different
services using their APIs and the `requests` module. In this tutorial, we take a look at two `python` packages
that take care of this low-level connection to simplify our access to databases. This is particularly interesting
for when we want to efficiently query data for a
large number of objects. The packages we look at are `astroquery` and
`rocks`. Internally, both packages make use of APIs and the `requests` package to provide a direct programmatic interface to their databases.

The tutorial focuses on physical parameters of asteroids and comets. For meteorites, refer to the
lecture slides (and the expert-level tutorial).

## `astroquery`

[astroquery](https://astroquery.readthedocs.io/en/latest/) is an
[astropy](https://www.astropy.org/)-affiliated package providing access to
astronomical databases and catalogues. The strength of this package is the
large [number of
services](https://astroquery.readthedocs.io/en/latest/#available-services) that
can be queried with a uniform syntax. It can retrieve mission data from ESA's
planetary science archive, execute cone-searches in various catalogues (as we
saw on Tuesday), and provide data on a wide range of astrophysical objects.

For our purposes, we are interested in services that provide data on small bodies. We can split them into two categories:

**Orbital Parameters and Ephemerides**
- [IMCCE](https://astroquery.readthedocs.io/en/latest/imcce/imcce.html) - For Asteroids and Comets 
- [JPLHorizons](https://astroquery.readthedocs.io/en/latest/jplhorizons/jplhorizons.html) - For Asteroids and Comets 
- [MPC](https://astroquery.readthedocs.io/en/latest/mpc/mpc.html) - For Asteroids and Comets 
- [NEODys](https://astroquery.readthedocs.io/en/latest/solarsystem/neodys/neodys.html) - For Near-Earth Asteroids

**Physical Parameters**
- [MPC](https://astroquery.readthedocs.io/en/latest/mpc/mpc.html) - For asteroids and comets
- [JPL Small-Body Database](https://ssd.jpl.nasa.gov/tools/sbdb_lookup.html) - For asteroids and comets.

Notebook `3.2 - Exercises` looks at the computation of ephemerides. We continue here with the physical parameters only, starting
with the Small-Body Database (SBDB).

### JPL Small-Body Database

The standard approach to query a service with `astroquery` is to import the service class and run a query function.
We see this pattern below with the `SBDB` class and its `query` function. The `query` function accepts a string as input
which identifies the object-of-interest, either by giving its name, number, or designation. To get physical parameters, we
have to set `phys=True`. We use the `python` built-in `pprint` command for pretty-printing the result (try running the regular `print` command
instead to see the benefit).

In [None]:
from astroquery.jplsbdb import SBDB
from pprint import pprint  # the pretty-printer for more readable output

data = SBDB.query('14328', phys=True)

pprint(data)

The output of this query is an `OrderedDict` - think of it like a regular `python` dictionary for now, where we access the
different values by indexing (this operation: `dict['key']`). The `data` object contains different kind of parameters:

In [None]:
data.keys()

The physical ones are stored under the `phys_par` key.

In [None]:
data['phys_par']

We have data!

In [None]:
print(f"{data['object']['shortname']} has an albedo of {data['phys_par']['albedo']} +- {data['phys_par']['albedo_sig']} according to {data['phys_par']['albedo_ref']}.")

The example asteroid we chose here has only a few known physical parameters.
Change the query example above to e.g. `ceres` to a more exhaustive list of
parameters that JPL provides, such as density, taxonomy, and rotation period.

Using the `SBDB.query` function in a loop, we can thus get physical parameters
for a lot of objects quickly. It is not efficient yet, as we are sending one
query per object to the server which takes some time, but it is already much
more convenient than going through the web interface.

The SBDB also contains physical parameters of comets.

In [None]:
data = SBDB.query('67P', phys=True)

pprint(data['phys_par'])

### Minor Planet Center

The Minor Planet Center database is accessible via the `MPC` class. We use the
`query_object` function to specify the type (`asteroid` or `comet`) and the
`name`, `number`, or `designation` of the target.

In [None]:
from astroquery.mpc import MPC

result = MPC.query_object('asteroid', name='ceres')
pprint(result)

We see that, compared to the physical parameters available through SBDB, the ones in the MPC
database are rather limited (it's really only the `absolute_magnitude`).

In [None]:
result = MPC.query_object('comet', designation='67P')
pprint(result)

Nevertheless, this service deserves a mention here as it allows to query for asteroids and comets based on *shared orbital parameters*. We switch
to the `query_objects` function and look for asteroids with a maximum aphelion distance of 1AU. We only want their names, numbers, and designations returned.

In [None]:
targets = MPC.query_objects('asteroid', aphelion_distance_max=1, return_fields=['name,number,designation'])
print(f"Found {len(targets)} asteroids matching these orbital criteria.")
pprint(targets[:5])

This means that we can now dynamically define orbital populations that we want to study. In this context, "dynamical" means that we can change
the population that we study by changing the parameters, rather than providing a *static* list of objects.

For example, let's combine SBDB and MPC to get all albedos of the objects with aphelia inside the Earth's orbit.

In [None]:
targets = MPC.query_objects('asteroid', aphelion_distance_max=1, return_fields=['name,number,designation'])

albedos = []

for target in targets:

  # We have to check whether we pass the number or the designation as the
  # MPC reports "designation: None" as soon as an object is numbered
  data = SBDB.query(target['designation'] if target['designation'] is not None else target['number'], phys=True)

  # Check if albedo is known
  if 'albedo' in data['phys_par']:
      albedos.append((data['object']['fullname'], data['phys_par']['albedo']))

print(len(albedos))
print(albedos)

Well, we cannot do much with that, but you get the idea.

### Dynamical physical groups and the art of good code syntax

We see that we can get different data with `astroquery`,
making it a great package especially when you want to combine results from
different services. The uniform syntax to query different services helps getting started.

There are two points that we might improve upon. First, while we can
dynamically define objects-of-interest based on orbital parameters with the
MPC-SBDB combination, we cannot do so based on physical parameters ("Give me
all B-types in the Themis family"). There is simply no service in the `astroquery` universe
*for now* that offers this.

Second, the response to our queries are typically `astropy` tables, `pandas`
dataframes or dictionaries. This leads you to think about your analysis in
terms of table rows and columns ("The data I need is in row 4 column 'albedo'",
"I need to access the cells where the row matches this condition"), which
translates into uncomfortable coding and analysis. The same goes for
dictionaries: As we see in the code above, it is rather tedious to access the
nested values that we are interested in. Instead, we want to think and write code
that resembles more our language: "Give me the albedo of Vesta". This is a core concept
of *object-oriented programming* - we will see this in action now.

*(Granted, the second point is somewhat subjective, but I (MM) am writing this tutorial, so I get to decide
what to put here)*

We pass on to the second package, `rocks`

## `rocks` - client for SsODNet

`rocks` is the `python` interface of the
[SsODNet](https://ssp.imcce.fr/webservices/ssodnet) services of the IMCCE. Like
`astroquery`, it takes care of the connection to the server via the API,
greatly facilitating the queries. Unlike `astroquery`, it also transforms the
server results and presents them in a way that is better suited for the
scripted analysis: as a `python` class object, the `Rock`. We will see the
benefits of this below.

Let's see an example. To get data on a given object, you pass the name, number,
or designation to the `Rock` class.

In [None]:
import rocks

# retrieve and ingest ssoCard of (1) Ceres
ceres = rocks.Rock(1)

ceres

Behind the scenes, `rocks` identified our object of interest as (1) Ceres and
downloaded its [ssoCard](https://ssp.imcce.fr/webservices/ssodnet/api/ssocard),
i.e. the best estimates of a large number of physical and dynamical parameters.
You can access this data via the dot-notation.

In [None]:
print(ceres.name)
print(ceres.number)
print(ceres.class_)

For numerical parameters like the albedo, you can use the `value` attribute to access
the albedo itself, and the `error.min` and `error.max` to get the upper and lower errors.
Or just `error_` to get the mean of `error.min` and `error.max`.

In [None]:
print(ceres.albedo.value)
print(ceres.albedo.error_)

SsODNet only collects parameters which are supported by a peer-reviewed publication.
You can trace the parameter value you use by accessing the `bibref` attribute.

In [None]:
print(ceres.mass.value)
print(ceres.mass.bibref.shortbib)
print(ceres.mass.bibref.bibcode)

##  Comparison to `astroquery` and SDBD

We see that accessing the data is intuitive and requires minimal code. The
small body is the principal object that we are working with, rather than being
one row in a `pandas.DataFrame` that we have to awkwardly index to access any
parameter. This is the benefit of using a specialised library like `rocks` to
access a single service. The downside compared to `astroquery` is that `rocks`
only connects to SsODNet while `astroquery` offers many more data products.

In terms of physical parameters, SsODNet offers more information than the SBDB.
Other parameters that are available through `rocks` include the
absolute_magnitude, colors, density, diameter, mass, phase function, spin,
taxonomy, and thermal inertia. You can have a look at the
[documentation](https://rocks.readthedocs.io) to find out more.

A downside of `rocks`/SsODNet compared to the SBDB is that currently only
asteroids are supported: comets are a work-in-progress.

## Many Objects

We continue exploring `rocks`. If you have more than one asteroid in mind,
you use the `rocks.rocks` function to create many `Rock`s.

In [None]:
targets = rocks.rocks(['ceres', 2, '1804 RA', '4'])

for target in targets:
    print(f"The mass of ({target.number}) {target.name} is {target.mass.value:.2}+-{target.mass.error_} [{target.mass.bibref.shortbib}].")

For fun, let's graph the diameter versus the mass of the first 100 numbered asteroids, and colour-code them by their albedos.

In [None]:
import matplotlib.pyplot as plt

# get data of first 100 numbered asteroids
targets = rocks.rocks(range(1, 101))

# get the parameters that we are interested in
diameters = [target.diameter.value for target in targets]
masses = [target.mass.value for target in targets]
albedos = [target.albedo.value for target in targets]

# and plot
fig, ax = plt.subplots()

scat = ax.scatter(diameters, masses, c=albedos, cmap='cool')
ax.set(xlabel='Diameter / km', ylabel='Mass / kg', xscale='log', yscale='log')
fig.colorbar(scat, label='Albedo')

It's not quite ready for a *Nature* paper yet, however, if we planned to publish this, we should of course cite our sources. Below, we loop over our targets and add the bibliographic references of the diameter, mass, and albedo properties to one list.

In [None]:
sources = []

for target in targets:
    sources += target.diameter.bibref.bibcode
    sources += target.albedo.bibref.bibcode
    sources += target.mass.bibref.bibcode
sources

There are certainly duplicates in our list of sources. We remove them using the `set()` function.

In [None]:
sources = set(sources)
print(f"We have {len(sources)} sources to cite:")
sources

From here, we can copy-paste the list of bibcodes into our LaTeX files.

We just did an analysis of a lot of objects with a few lines of code, including the often tedious bibliography
management. If our targets change, we can update the dynamical definition and just re-run the code to produce
all relevant figures and citations. Nice!

That's it for the basic introduction to getting physical parameters of asteroids and comets with `astroquery` and `rocks`. The advanced
part of this tutorial shows how to select many objects based on shared physical properties using `rocks` and a general application of `astroquery` and `rocks` to a catalogue of asteroid observations, while in the expert-level tutorial,
we build our own little tool to get meteorite classifications programmatically from the Meteoritical Bulletin.