# Gaia Data

### On June 13, 2022 the [Gaia project](https://www.cosmos.esa.int/web/gaia/dr3) released is third major data release containg about 1.5 billion sources.

- For Astro 300, we will use a subset of the main data source. 

- This subset is still really large (1906.8 GB), so we will use python to access this data in an efficient manner.

- #### The Gaia database we will use is called `gaiadr3.gaia_source_lite`

In [None]:
import numpy as np
from astropy.table import QTable
from astroquery.gaia import Gaia

---
# SQL/ADQL Database query language
 
SQL (Structured Query Language) is a language designed for managing data held in a relational database management systems. SQL has became the most widely used database language.

Astronomical Data Query Language (ADQL) is a specialised variant of SQL developed for use with the proliferation of astronomical datasets, and extends the functionality of SQL in an astronomical context.

[The Gaia ADQL cookbook](https://www.gaia.ac.uk/data/gaia-data-release-1/adql-cookbook) is a great resource for learning the ADQL syntax.


## ADQL Query

A typical ADQL query has the form:

```
SELECT 
{columns}
FROM {database}
WHERE {conditions}
```

The ADQL commands are usually ALLCAPS and the other commands are lowercase.

There is a real example of a ADQL query to get the columns: `source_id`, `ra`, `dec`, and `parallax` from `gaiadr3.gaia_source_lite` database for all objects where the value of the `parallax` column is greater than 200 mas. The columns will be ordered by decreasing values of `parallax`:

```
SELECT TOP 10
source_id, ra, dec, parallax
FROM gaiadr3.gaia_source_lite
WHERE parallax > 200.0
ORDER BY parallax DESC
```

#### It is really good to add `TOP 10` to the `SELECT` when you first do a query, so you do not drop millions of lines into your notebook!

---
## Let's get some data

- First we create the query string as a doc-string

In [None]:
query_one = """
SELECT TOP 10
source_id, ra, dec, parallax
FROM gaiadr3.gaia_source_lite
WHERE parallax > 200
ORDER BY parallax DESC
"""

In [None]:
print(query_one)

## Submit our query to the Gaia archive server

In [None]:
my_job_query = Gaia.launch_job(query_one)

### Check the status of the job

In [None]:
print(my_job_query)

### Looks good so get the results

- The results will be a nice astropy Qtable

In [None]:
my_parallax_table = my_job_query.get_results()

In [None]:
my_parallax_table

---

# A more complicated example

Let's say you want to find all of the objects within a certain area of the sky

<img src="https://uwashington-astro300.github.io/A300_images/Finder.png" width="400"/>

The ADQL language includes lots of astronomy-specific stuff like `POINT` and `CIRCLE`. This allows us to specify a point in space and region around the point.

The coordinates of `POINT` and `CIRCLE` are usually set to `IRCS` (International Celestial Reference System) 

`POINT('IRCS', RA(deg), DEC(deg))` specifies a point on the celestial sphere.

`CIRCLE('IRCS', RA(deg), DEC(deg), radius(deg))` specifies a circular region on the celestial sphere centred at the given coordinates and with the given radius in degrees.

`CONTAINS` is a strange but super useful ADQL function. By setting `CONTAINS(POINT, CIRCLE) = 1` you will pick put all objects centered on `POINT` within `CIRCLE`.

#### *Here is a query to find all Gaia objects within 0.5 degrees of RA = 23.5 deg, Dec = 0.0 deg that are brighter than 12th mag and have color (BP-RP) data.*

In [None]:
query_two = """
SELECT TOP 100
source_id, ra, dec, phot_g_mean_mag, bp_rp
FROM gaiadr3.gaia_source_lite
WHERE CONTAINS(
   POINT('ICRS', 23.5, 0.0),
   CIRCLE('ICRS', ra, dec, 0.5)
   ) = 1
AND phot_g_mean_mag < 12.0
AND bp_rp IS NOT NULL
ORDER BY bp_rp ASC
"""

In [None]:
print(query_two)

In [None]:
my_job_query = Gaia.launch_job(query_two)

In [None]:
print(my_job_query)

In [None]:
my_finder_table = my_job_query.get_results()

In [None]:
my_finder_table

In [None]:
my_finder_table.show_in_notebook()

## What else is in the field?

#### I used the [ESO Online Digitized Sky Survey](http://archive.eso.org/dss/dss) to get an image of the field

<img src="https://uwashington-astro300.github.io/A300_images/StarField.jpg" width="500"/>

In [None]:
query_three = """
SELECT TOP 200
source_id, ra, dec, phot_g_mean_mag, in_galaxy_candidates
FROM gaiadr3.gaia_source_lite
WHERE CONTAINS(
   POINT('ICRS', 23.5, 0.0),
   CIRCLE('ICRS', ra, dec, 0.5)
   ) = 1
AND in_galaxy_candidates = 'True'
ORDER BY phot_g_mean_mag ASC
"""

In [None]:
print(query_three)

In [None]:
my_job_query = Gaia.launch_job(query_three)

In [None]:
print(my_job_query)

In [None]:
my_strange_star = my_job_query.get_results()

In [None]:
my_strange_star

---

## ADQL querys can get SUPER complicated! - I have shown you the merest baby steps. 

## If you want to see how the pros work, check out the [Gaia ADQL Guide](https://www.cosmos.esa.int/web/gaia-users/archive/writing-queries)