# 1. Queries
This is the first in a series of lessons about working with astronomical data.

As a running example, we will replicate parts of the analysis in a recent paper, "[Off the beaten path: Gaia reveals GD-1 stars outside of the main stream](https://arxiv.org/abs/1805.00425)" by Adrian Price-Whelan and Ana Bonaca.

## Outline
This lesson demonstrates the steps for selecting and downloading data from the Gaia Database:
1. First we'll make a connection to the Gaia server,
2. We will explore information about the database and the tables it contains,
3. We will write a query and send it to the server, and finally
4. We will download the response from the server.

## Query Language
In order to select data from a database, you have to compose a query, which is a program written in a "query language". The query language we'll use is ADQL, which stands for "Astronomical Data Query Language".

ADQL is a dialect of [SQL](https://en.wikipedia.org/wiki/SQL)(Structured Query Language), which is by far the most commonly used query language. Almost everything you will learn about ADQL also works in SQL.

[The reference manual for ADQL is here](https://www.ivoa.net/documents/ADQL/20180112/PR-ADQL-2.1-20180112.html). But you might find it easier to learn from [this ADQL Cookbook](https://www.gaia.ac.uk/data/gaia-data-release-1/adql-cookbook).

## Using Jupyter
If you have not worked with Jupyter notebooks before, you might start with [the tutorial on Jupyter.org called "Try Classic Notebook"](https://jupyter.org/try), or [this tutorial from DataQuest](https://www.dataquest.io/blog/jupyter-notebook-tutorial/).

There are two environments you can use to write and run notebooks:
* "Jupyter Notebook" is the original, and
* "Jupyter Lab" is a newer environment with more features.
For these lessons, you can use either one

If you are too impatient for the tutorials, here are the most important things to know:
1. Notebook are made up of code cells and text cells (and a few other less common kinds). Code cells contain code; text cells, like this one, contain explanatory text written in [Markdown](https://www.markdownguide.org/).
2. To run a code cell, click the cell to select it and press `Shift + Enter`. The output of the code should appear below the cell.
3. In general, notebooks only run correctly if you run every code cell in order from top to bottom. If you run cells out of order, you are likely to get errors.
4. You can modify existing cells, but then you have to run them again to see the effect.
5. You can add new cells, but again, you have to be careful about the order you run them in.
6. If you added or modified cells and the behavior of the notebook seems strange, you can restart the "kernel", which clears all of the variables and functions you have defined, and run the cells again from the beginning.

* If you are using Jupyter notebook, open the `Kernel` menu and select "Restart and Run All".
* In Jupyter Lab, open the `Kernel` menu and select "Restart Kernel and Run All Cells".
* In Colab, open the `Runtime` menu and select "Restart and run all".

Before you go on, you might want to explore the other menus and the toolbar to see what else can you do.

## Installing Libraries
If you are running this notebook on Colab, you should run the following cell to install the libraries we'll need.

If you are running this notebook on your own computer, you might have to install these libraries yourself.

In [None]:
# If we're running on Colab, install libraries
import sys

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    %pip install astroquery

## Connecting to Gaia
The library we'll use to get Gaia data is [Astroquery](https://astroquery.readthedocs.io/en/latest/). Astroquery provides `Gaia`, which is an [object that represents a connection to the Gaia database](https://astroquery.readthedocs.io/en/latest/gaia/gaia.html).

We can connect to Gaia database like this:

In [None]:
from astroquery.gaia import Gaia

This import statement creates a [TAP+](http://www.ivoa.net/documents/TAP/) connection; TAP stands for "Table Access Protocol", which is a network protocol for sending queries to the database and getting back the results.

## Databases and Tables
What is a database, anyway? Most generally, it can be any collection of data, but when we are talking about ADQL or SQL:
* A database is a collection of one or more named tables.
* Each table is a 2-D array with one or more named columns of data.

We can use `Gaia.load_tables` to get the names of the tables in the Gaia database. With the option `only_names=True`, it loads information about the tables, called "metadata", not the data itself.

In [None]:
tables = Gaia.load_tables(only_names=True)

The following `for` loop prints the names of the tables.

In [None]:
for table in tables:
    print(table.name)

So that's a lot of tables. The ones we'll use are:
* `gaiadr2.gaia_source`, which contains Gaia data from [data release 2](https://www.cosmos.esa.int/web/gaia/data-release-2)
* `gaiadr2.panstarrs1_original_valid`, which contains the photometry data we'll use from PanSTARRS, and
* `gaiadr2.panstarrs1_best_neighbour`, which we'll use to cross-match each star observed by Gaia with the same star observed by PanSTARRS.

We can use `load_table` (not `load_tables`) to get the metadata for a single table. The name of this function is misleading, because it only downloads metadata, not the contents of the table.

In [None]:
meta = Gaia.load_table('gaiadr2.gaia_source')
meta

Jupyter shows that the result is an object of type `TapTableMeta`, but it does not display the contents.

To see the meta data