## Using ADQL to download Gaia data

This is the first in a series of lessons related to astronomy data.  As a running example, we will replicate part of the analysis in a recent paper, "[Off the beaten path: Gaia reveals GD-1 stars outside of the main stream](https://arxiv.org/abs/1805.00425)" by Adrian M. Price-Whelan and Ana Bonaca.

As the abstract explains, "Using data from the Gaia second data release combined with Pan-STARRS photometry, we present a sample of highly-probable members of the longest cold stream in the Milky Way, GD-1."

GD-1 is a [stellar stream](https://en.wikipedia.org/wiki/List_of_stellar_streams) which is "an association of stars orbiting a galaxy that was once a globular cluster or dwarf galaxy that has now been torn apart and stretched out along its orbit by tidal forces."

The two datasets used in this study are
 
* [Gaia](https://en.wikipedia.org/wiki/Gaia_(spacecraft)), which is "a space observatory of the European Space Agency (ESA), launched in 2013 ... designed for astrometry: measuring the positions, distances and motions of stars with unprecedented precision", and

* [PanSTARRS](https://en.wikipedia.org/wiki/Pan-STARRS): "The Panoramic Survey Telescope and Rapid Response System, located at Haleakala Observatory, Hawaii, US, consists of astronomical cameras, telescopes and a computing facility that is surveying the sky for moving or variable objects on a continual basis, and also producing accurate astrometry and photometry of already-detected objects."

Both of these datasets are very large, which can make them challenging to work with.  One of the goals of this workshop is to provide tools for working with large datasets.

One of the most important of those tools is a "query language", which is a way to query a large database and efficiently select the information you need.  So that's where we'll start.

The query language we'll use is ADQL, which stands for "Astronomical Data Query Language".

ADQL is a dialect of [SQL](https://en.wikipedia.org/wiki/SQL) (Structured Query Language), which is by far the most commonly used query language.  Almost everything you learn about ADQL also works in SQL.

[The reference manual for ADQL is here](http://www.ivoa.net/documents/ADQL/20180112/PR-ADQL-2.1-20180112.html).

But you might find it easier to learn from [this ADQL Cookbook](https://www.gaia.ac.uk/data/gaia-data-release-1/adql-cookbook).

## Getting Gaia data

The library we'll use to get Gaia data is [Astroquery](https://astroquery.readthedocs.io/en/latest/).  If you are running this notebook on your own computer, you might have to install Astroquery.  You should have received instructions for this before the workshop.

If you are running this notebook on Colab, you can run the following cell to install Astroquery and a couple of other libraries we'll use.

In [1]:
# If we're running on Colab, install libraries

import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    !pip install astroquery astro-gala pyia
    !mkdir data

From Astroquery we can import `Gaia`, which is an [object that represents a connection to the Gaia database](https://astroquery.readthedocs.io/en/latest/gaia/gaia.html).

Running this import statement has the effect of creating a [TAP+](http://www.ivoa.net/documents/TAP/) connection; TAP stands for "Table Access Protocol".  It is a network protocol for sending queries to the database and getting back the results.

In [2]:
from astroquery.gaia import Gaia

Created TAP+ (v1.2.1) - Connection:
	Host: gea.esac.esa.int
	Use HTTPS: True
	Port: 443
	SSL Port: 443
Created TAP+ (v1.2.1) - Connection:
	Host: geadata.esac.esa.int
	Use HTTPS: True
	Port: 443
	SSL Port: 443


I'm not sure why it seems to create two connections.

What is a database, anyway?  Most generally, it can be any collection of data, but when we are talking about ADQL or SQL:

* A database is a collection of one or more named tables.

* Each table is a 2-D array with one or more named columns of data.

We can use `load_tables` to get the names of the tables in the Gaia database.  With the option `only_names=True`, it loads information about the tables, the "metadata", not the data itself.

In [3]:
tables = Gaia.load_tables(only_names=True)
for table in (tables):
    print(table.get_qualified_name())

INFO: Retrieving tables... [astroquery.utils.tap.core]
INFO: Parsing tables... [astroquery.utils.tap.core]
INFO: Done. [astroquery.utils.tap.core]
external.external.apassdr9
external.external.gaiadr2_geometric_distance
external.external.galex_ais
external.external.ravedr5_com
external.external.ravedr5_dr5
external.external.ravedr5_gra
external.external.ravedr5_on
external.external.sdssdr13_photoprimary
external.external.skymapperdr1_master
external.external.tmass_xsc
public.public.hipparcos
public.public.hipparcos_newreduction
public.public.hubble_sc
public.public.igsl_source
public.public.igsl_source_catalog_ids
public.public.tycho2
public.public.dual
tap_config.tap_config.coord_sys
tap_config.tap_config.properties
tap_schema.tap_schema.columns
tap_schema.tap_schema.key_columns
tap_schema.tap_schema.keys
tap_schema.tap_schema.schemas
tap_schema.tap_schema.tables
gaiadr1.gaiadr1.aux_qso_icrf2_match
gaiadr1.gaiadr1.ext_phot_zero_point
gaiadr1.gaiadr1.allwise_best_neighbour
gaiadr1.gaiad

So that's a lot of tables.  The ones we'll use are:

* gaiadr2.gaia_source, which contains Gaia data from [data release 2](https://www.cosmos.esa.int/web/gaia/data-release-2),

* gaiadr2.panstarrs1_original_valid, which contains the photometry data we'll use from PanSTARRS, and

* gaiadr2.panstarrs1_best_neighbour, which we'll use to cross-match each star observed by Gaia with the same star observed by PanSTARRS.

We can use `load_table` (not `load_tables`) to get the metadata for a single table.  The name of this function is misleading, because it only downloads metadata. 

In [5]:
table = Gaia.load_table('gaiadr2.gaia_source')
table

Retrieving table 'gaiadr2.gaia_source'
Parsing table 'gaiadr2.gaia_source'...
Done.


<astroquery.utils.tap.model.taptable.TapTableMeta at 0x7f5dca48fa30>

Notice one gotcha: in the list of table names, this table appears as `gaiadr2.gaiadr2.gaia_source`, but when we load the metadata, we refer to it as `gaiadr2.gaia_source`.

Jupyter shows that the result is an object of type `TapTableMeta`, but it does not display the contents.

To see the metadata, we have to print the object.

In [6]:
print(table)

TAP Table name: gaiadr2.gaiadr2.gaia_source
Description: This table has an entry for every Gaia observed source as listed in the
Main Database accumulating catalogue version from which the catalogue
release has been generated. It contains the basic source parameters,
that is only final data (no epoch data) and no spectra (neither final
nor epoch).
Num. columns: 96


The following loop prints the names of the columns in the table.

In [9]:
for column in (table.columns):
    print(column.name)

solution_id
designation
source_id
random_index
ref_epoch
ra
ra_error
dec
dec_error
parallax
parallax_error
parallax_over_error
pmra
pmra_error
pmdec
pmdec_error
ra_dec_corr
ra_parallax_corr
ra_pmra_corr
ra_pmdec_corr
dec_parallax_corr
dec_pmra_corr
dec_pmdec_corr
parallax_pmra_corr
parallax_pmdec_corr
pmra_pmdec_corr
astrometric_n_obs_al
astrometric_n_obs_ac
astrometric_n_good_obs_al
astrometric_n_bad_obs_al
astrometric_gof_al
astrometric_chi2_al
astrometric_excess_noise
astrometric_excess_noise_sig
astrometric_params_solved
astrometric_primary_flag
astrometric_weight_al
astrometric_pseudo_colour
astrometric_pseudo_colour_error
mean_varpi_factor_al
astrometric_matched_observations
visibility_periods_used
astrometric_sigma5d_max
frame_rotator_object_type
matched_observations
duplicated_source
phot_g_n_obs
phot_g_mean_flux
phot_g_mean_flux_error
phot_g_mean_flux_over_error
phot_g_mean_mag
phot_bp_n_obs
phot_bp_mean_flux
phot_bp_mean_flux_error
phot_bp_mean_flux_over_error
phot_bp_mean_ma

To find out what the columns mean, you can read [the documentation of this table here](https://gea.esac.esa.int/archive/documentation/GDR2/Gaia_archive/chap_datamodel/sec_dm_main_tables/ssec_dm_gaia_source.html).

## Writing queries

Now you might be wondering how we actually download the data.  With tables this big, you generally don't.  Instead, you use queries to select only the data you want.

To do that, we're going to write an ADQL query.  Here's an example:

In [11]:
query1 = """SELECT 
TOP 10
source_id, ref_epoch, ra, dec, parallax 
FROM gaiadr2.gaia_source"""

The words in uppercase are ADQL keywords:

* `SELECT` indicates that we are selecting data (as opposed to adding or modifying data).

* `TOP` indicates that we only want the first 10 rows of the table, which is useful for testing a query before asking for all of the data.

* `FROM` specifies which table we want data from.

The third line is a list of column names, indicating which columns we want.  

I wrote the column names in lowercase to make it clear that they are not keywords.  This is a common style, but it is not required.

To run this query, we use `Gaia.launch_job`:

In [12]:
job1 = Gaia.launch_job(query1)
job1

<astroquery.utils.tap.model.job.Job at 0x7f5dc9ade4c0>

The result is an object that represents the job running on a Gaia server.

If you print it, it displays metadata for the forthcoming table.

In [13]:
print(job1)

<Table length=10>
           name            dtype  unit                            description                            
------------------------- ------- ---- ------------------------------------------------------------------
                source_id   int64      Unique source identifier (unique within a particular Data Release)
                ref_epoch float64   yr                                                    Reference epoch
                       ra float64  deg                                                    Right ascension
                      dec float64  deg                                                        Declination
                 parallax float64  mas                                                           Parallax
astrometric_n_good_obs_al   int32                                          Number of good observations AL
Jobid: None
Phase: COMPLETED
Owner: None
Output file: sync_20200727133344.xml.gz
Results: None


Now we can get the results:


In [15]:
results1 = job1.get_results()
type(results1)

astropy.table.table.Table

The results is an [Astropy Table](https://docs.astropy.org/en/stable/table/) is similar to a table in an SQL database except:

* SQL databases are stored on disk drives, so they are persistent; that is, they "survive" even if you turn off the computer.  An Astropy `Table` is stored in memory; it disappears when you turn off the computer (or shut down this Jupyter notebook).

* SQL databases are designed to process queries.  And Astropy `Table` can perform some query-like operations, like selecting columns and rows.  But these operations use Python syntax, not SQL.

Jupyter knows how to display the contents of a `Table`.

In [16]:
results1

source_id,ref_epoch,ra,dec,parallax,astrometric_n_good_obs_al
Unnamed: 0_level_1,yr,deg,deg,mas,Unnamed: 5_level_1
int64,float64,float64,float64,float64,int32
5778048606007762688,2015.5,252.9342027623728,-76.36404355513356,0.0073540961896056,233
5778089219220100096,2015.5,252.72779858159535,-75.86754589201404,0.2558129674996651,253
5778038302382901760,2015.5,253.7304844225416,-75.97044898678163,0.19971165734613,179
5778065137341689856,2015.5,252.6801881425296,-75.94965439295241,0.7703449958336754,241
5778053725609483136,2015.5,252.12098597753152,-76.28043425673263,0.2785262758494972,248
5778073521116914176,2015.5,251.2555423220548,-75.99925834891171,0.3851764561721036,283
5778088291507103360,2015.5,252.9473667300663,-75.89270654732523,-0.0489400870855034,266
5778084408856419584,2015.5,253.1933563035603,-75.99089099992894,0.4475173051231056,253
5778106781342048512,2015.5,251.79775298608456,-75.69389584692645,2.714767898744601,141
5778043383329098368,2015.5,254.3454010558042,-75.84368312688588,0.1642702720562393,252


Each column has a name, units, and a data type.  

For example, the units of `ra` and `dec` are degrees, and their data type is `float64`, which is a 64-bit floating-point number, used to store measurements with a fraction part.

**Exercise:** Read [the documentation of this table](https://gea.esac.esa.int/archive/documentation/GDR2/Gaia_archive/chap_datamodel/sec_dm_main_tables/ssec_dm_gaia_source.html) and select at least one column name that looks interesting to you.  Add the columns you selected to the query and run it again.  What are the units of the column you selected?  What is its data type?

## Asynchronous queries

`launch_job` asks the server to run the job "synchronously", which normally means it runs immediately.  But synchronous jobs are limited to 2000 rows.  For queries that return more rows, you have to run "asynchronously", which mean they might take longer to get started.

The results of an asynchronous query are stored in a file on the server, so you can start a query and come back later to get the results.

For anonymous users, files are kept for three days.

We could run the same query asynchronously, but to live things up, let's add a new keyword, `WHERE`:

In [21]:
query2 = """SELECT 
TOP 100
source_id, ref_epoch, ra, dec, parallax
FROM gaiadr2.gaia_source
WHERE parallax < 1
"""

A `WHERE` clause indicates which rows we want; in this case, the query selects only rows "where" `parallax` is less than 1.

We use `launch_job_async` to submit an asynchronous query.

In [22]:
job2 = Gaia.launch_job_async(query2)
print(job2)

INFO: Query finished. [astroquery.utils.tap.core]
<Table length=100>
   name    dtype  unit                            description                            
--------- ------- ---- ------------------------------------------------------------------
source_id   int64      Unique source identifier (unique within a particular Data Release)
ref_epoch float64   yr                                                    Reference epoch
       ra float64  deg                                                    Right ascension
      dec float64  deg                                                        Declination
 parallax float64  mas                                                           Parallax
Jobid: 1595873025110O
Phase: COMPLETED
Owner: None
Output file: async_20200727140345.vot
Results: None


And here are the results.

In [23]:
results2 = job2.get_results()
results2

source_id,ref_epoch,ra,dec,parallax
Unnamed: 0_level_1,yr,deg,deg,mas
int64,float64,float64,float64,float64
252774600785802624,2015.5,69.99626154666085,43.7101784985928,0.02633576666903798
252734880930669184,2015.5,68.57304083666482,44.376631718750076,0.26340009383527696
252746833820937472,2015.5,69.08000558938154,44.7603067851397,0.06365564389354074
252822012931739776,2015.5,70.14333865655641,44.159041645162205,0.12099182852635189
252757760217742080,2015.5,68.8528632228467,44.76229208021387,0.6176308559754744
252797862330831616,2015.5,69.97841070190951,44.05438980855108,0.6180097666020667
252730444227447296,2015.5,68.5884205700954,44.25683422132962,0.6247538128165461
252767518385901312,2015.5,69.94297847266124,43.62473260211258,0.16206466767726954
252776314479222912,2015.5,70.22691444936974,43.81803055599712,-1.977782665950659
...,...,...,...,...


Asynchronous jobs have a `jobid`.

In [25]:
job1.jobid, job2.jobid

(None, '1595873025110O')

Which you can use to remove the job from the server.

In [26]:
Gaia.remove_jobs([job2.jobid])

Removed jobs: '['1595873025110O']'.


**Exercise:** The clauses in a query have to be in the right order.  Go back and change the order of the clauses in `query2` and run it again.  

They query should fail, but notice that you don't get much useful debugging information.  

For this reason, developing and debugging ADQL queries can be really hard.  A few suggestions that might help:

* Whenever possible, start with a working query, either an example you find online or a query you have used in the past.

* Make small changes and test each change before you continue.

* While you are debugging, use `TOP` to limit the number of rows in the result.  That will make each attempt run faster, which reduces your testing time.

**Exercise:**  In a `WHERE` clause, you can use any of the comparison operators:

* `>`: greater than
* `<`: less than
* `>=`: greater than or equal
* `<=`: less than or equal
* `=`: equal
* `<>`: not equal

Most of these are the same as Python, but some are not.  Be careful to keep your Python out of your ADQL!

You can combine comparisons using the logical operators:

* AND: true if both comparisons are true
* OR: true if either or both comparisons are true
* NOT: true 

[Read about SQL operators here](https://www.w3schools.com/sql/sql_operators.asp) and then modify the previous query to select rows where `bp_rp` is between `-0.75` and `2`.


## Formatting queries

So far the queries have been string "literals", meaning that the entire string is part of the program.
But writing queries yourself can be slow, repetitive, and error-prone.

It is often a good idea two write Python code that assembles a query for you.  One useful tool for that is the [string `format` method](https://www.w3schools.com/python/ref_string_format.asp).

As an example, here's a list of columns we might want to select.

In [42]:
names = 'source_id, ra, dec, pmra, pmdec, parallax, parallax_error'

The following is a "base" for a query; it's a string that contains at least one format specifier in curly brackets (braces).

In [51]:
query3_base = """SELECT TOP 10
{columns}
FROM gaiadr2.gaia_source
WHERE parallax < 1 AND 
bp_rp BETWEEN -0.75 AND 2
"""

This base query contains one format specifier, `{columns}`.

To assemble the query, we invoke `format` on the base string and provide a keyword arguments that assigns a value to `columns`.

In [52]:
query3 = query3_base.format(columns=names)

We'll use the following function to print multi-line queries readably.

In [47]:
def print_query(query):
    """Print an ADQL query readably.
    
    query: string
    """
    for line in query.split('\n'):
        print(line)

Here's the query we just assembled.

In [48]:
print_query(query3)

SELECT TOP 10
source_id, ra, dec, pmra, pmdec, parallax, parallax_error
FROM gaiadr2.gaia_source
WHERE parallax < 1 AND 
bp_rp BETWEEN -0.75 AND 2



The format specifier has been replaced with the value of `names`.

Let's run it and see if it works:

In [49]:
job3 = Gaia.launch_job(query3)
print(job3)

<Table length=10>
     name       dtype    unit                              description                            
-------------- ------- -------- ------------------------------------------------------------------
     source_id   int64          Unique source identifier (unique within a particular Data Release)
            ra float64      deg                                                    Right ascension
           dec float64      deg                                                        Declination
          pmra float64 mas / yr                         Proper motion in right ascension direction
         pmdec float64 mas / yr                             Proper motion in declination direction
      parallax float64      mas                                                           Parallax
parallax_error float64      mas                                         Standard error of parallax
Jobid: None
Phase: COMPLETED
Owner: None
Output file: sync_20200727144812.xml.gz
Results: N

In [50]:
results3 = job3.get_results()
results3

source_id,ra,dec,pmra,pmdec,parallax,parallax_error
Unnamed: 0_level_1,deg,deg,mas / yr,mas / yr,mas,mas
int64,float64,float64,float64,float64,float64,float64
252774600785802624,69.99626154666085,43.7101784985928,-2.278426834901382,-0.4768141384453612,0.0263357666690379,0.4824054745507884
252734880930669184,68.57304083666482,44.37663171875008,0.8391415224655223,-0.8983715446697241,0.2634000938352769,0.1231851977715751
252746833820937472,69.08000558938154,44.7603067851397,0.137117815974135,-1.3481109353588128,0.0636556438935407,0.3068505637052745
252822012931739776,70.14333865655641,44.159041645162205,1.1734554738766216,1.1623554504181817,0.1209918285263518,0.3027910444433995
252757760217742080,68.8528632228467,44.76229208021387,-3.168738245862547,-2.602632073172504,0.6176308559754744,0.0800266694970515
252797862330831616,69.97841070190951,44.05438980855108,4.527304898262951,-6.615270208874573,0.6180097666020667,0.0813013105628765
252730444227447296,68.5884205700954,44.25683422132962,3.007898914090106,-0.874584161010912,0.6247538128165461,0.2798106539457499
252767518385901312,69.94297847266124,43.62473260211258,1.2011583302404267,-0.2954402412112242,0.1620646676772695,0.3391367075491901
252776314479222912,70.22691444936974,43.81803055599712,-3.884960242513261,-1.8719420855501336,-1.977782665950659,0.7869335875017431
252766693752143872,69.77245528846072,43.58880899498524,-0.4106144927442899,1.1873476456668934,-0.398554976578538,0.4154136264241899


Good so far.

**Exercise:** This query always selects sources with `parallax` less than 1.  But suppose you want to take that upper bound as an input.

Modify `query3_base` to replace `1` with a format specifier like `{max_parallax}`.  Now, when you call `format`, add a keyword argument that assigns a value to `max_parallax`, and confirm that the format specifier gets replaced with the value you provide.

**Style note:**  You might notice that the variable names in this notebook are numbered, like `query1`, `query2`, etc.  

The advantage of this style is that it isolated each section of the notebook from the others, so if you go back and run the cells out of order, it's less likely that you will get unexpected interactions.

A drawback of this style is that it can be a nuisance to update the notebook if you add, remove, or reorder a section.

What do you think of this choice?  Are there alternatives you prefer?

## Selecting a region

One of the most common ways to restrict a query is to select stars in a particular region of the sky.

For example, here's a query from the [Gaia archive documentation](https://gea.esac.esa.int/archive-help/adql/examples/index.html) that selects "all the objects ... in a circular region centered at (266.41683, -29.00781) with a search radius of 5 arcmin (0.08333 deg)."

In [64]:
query = """
SELECT 
TOP 10 source_id
FROM gaiadr2.gaia_source
WHERE 1=CONTAINS(
  POINT(ra, dec),
  CIRCLE(266.41683, -29.00781, 0.08333333))
"""

This query uses three keywords that are specific to ADQL (not SQL):

* `POINT`: a location in [ICRS coordinates](https://en.wikipedia.org/wiki/International_Celestial_Reference_System), specified in degrees of right ascension and declination.

* `CIRCLE`: a circle where the first two values are the coordinates of the center and the third is the radius in degrees.

* `CONTAINS`: a function that returns `1` if a `POINT` is contained in a shape and `0` otherwise.

Here is the [documentation of `CONTAINS`](http://www.ivoa.net/documents/ADQL/20180112/PR-ADQL-2.1-20180112.html#tth_sEc4.2.12).

A query like this is called a cone search because it selects stars in a cone.

In [65]:
job = Gaia.launch_job(query)
result = job.get_results()
result

source_id
int64
4057468321929794432
4057468287575835392
4057482027171038976
4057470349160630656
4057470039924301696
4057469868125641984
4057468351995073024
4057469661959554560
4057470520960672640
4057470555320409600


**Exercise:** When you are debugging queries like this, you can use `TOP` to limit the size of the results, but then you still don't know how big the results will be.

And alternative is to use `COUNT`, which asks for the number of rows that would be selected, but it does not return them.

In the previous query, replace `TOP 10 source_id` with `COUNT(source_id)` and run the query again.  How many stars has Gaia identified in the cone we searched?

## Working with coordinates

The next step is to select a viewing rectangle, but before we do that, we have to deal with coordinates.

In [24]:
import astropy.units as u

low1, high1 = -55, -45
phi1 = [low1, low1, high1, high1] * u.deg

low2, high2 = -5, 5
phi2 = [low2, high2, high2, low2] * u.deg

`gc.GD1Koposov10` is the [Astropy coordinate class for the Sagittarius coordinate system](https://gala-astro.readthedocs.io/en/latest/_modules/gala/coordinates/gd1.html)

In [25]:
import gala.coordinates as gc

corners = gc.GD1Koposov10(phi1=phi1, phi2=phi2)
type(corners)

gala.coordinates.gd1.GD1Koposov10

In [26]:
corners

<GD1Koposov10 Coordinate: (phi1, phi2) in deg
    [(-55., -5.), (-55.,  5.), (-45.,  5.), (-45., -5.)]>

Convert to [International Celestial Reference System](https://en.wikipedia.org/wiki/International_Celestial_Reference_System)

In [27]:
import astropy.coordinates as coord

corners_icrs = corners.transform_to(coord.ICRS)
type(corners_icrs)

astropy.coordinates.builtin_frames.icrs.ICRS

In [28]:
corners_icrs

<ICRS Coordinate: (ra, dec) in deg
    [(143.65740786, 20.98189113), (134.46717444, 26.39291777),
     (140.58825494, 34.85481377), (150.16628418, 29.01557079)]>

In [29]:
corners_icrs[0]

<ICRS Coordinate: (ra, dec) in deg
    (143.65740786, 20.98189113)>

In [30]:
corners_icrs[0].ra

<Longitude 143.65740786 deg>

In [31]:
corners_icrs[0].ra.degree

143.65740785846373

In [32]:
corners_icrs.ra

<Longitude [143.65740786, 134.46717444, 140.58825494, 150.16628418] deg>

In [33]:
corners_icrs.ra.degree

array([143.65740786, 134.46717444, 140.58825494, 150.16628418])

We can use `corners_icrs` to specify a polygon and construct a more complex query.

In [34]:
query4_base = """SELECT {columns}
FROM gaiadr2.gaia_source
WHERE parallax < 1 AND bp_rp > -0.75 AND bp_rp < 2 AND
      CONTAINS(POINT(ra, dec), 
               POLYGON({ra[0]}, {dec[0]}, 
                       {ra[1]}, {dec[1]}, 
                       {ra[2]}, {dec[2]}, 
                       {ra[3]}, {dec[3]})) = 1
"""

Here's what it looks like.

In [35]:
query4 = query4_base.format(columns=columns, 
                            ra=corners_icrs.ra.degree,
                            dec=corners_icrs.dec.degree)
print_query(query4)

SELECT source_id, ra, dec, pmra, pmdec, parallax, parallax_error
FROM gaiadr2.gaia_source
WHERE parallax < 1 AND bp_rp > -0.75 AND bp_rp < 2 AND
      CONTAINS(POINT(ra, dec), 
               POLYGON(143.65740785846373, 20.98189112798802, 
                       134.46717444171475, 26.39291776724364, 
                       140.58825494277238, 34.85481376928442, 
                       150.16628417989443, 29.015570791894923)) = 1



And here's how we run it.

In [36]:
job4 = Gaia.launch_job_async(query4)
print(job4)

INFO: Query finished. [astroquery.utils.tap.core]
<Table length=120756>
     name       dtype    unit                              description                            
-------------- ------- -------- ------------------------------------------------------------------
     source_id   int64          Unique source identifier (unique within a particular Data Release)
            ra float64      deg                                                    Right ascension
           dec float64      deg                                                        Declination
          pmra float64 mas / yr                         Proper motion in right ascension direction
         pmdec float64 mas / yr                             Proper motion in declination direction
      parallax float64      mas                                                           Parallax
parallax_error float64      mas                                         Standard error of parallax
Jobid: 1595337353369O
Phase: COMPLETE

In [37]:
results4 = job4.get_results()
len(results4)

120756

## Saving results

In [38]:
filename = 'data/gd1_results4.fits'
results4.write(filename, overwrite=True)

In [39]:
import os

def filesize(filename):
    size = os.path.getsize(filename)
    print(size / 1024 / 1024, 'MB')

In [40]:
filesize(filename)

6.46270751953125 MB


In [41]:
from astropy.table import Table

filename = 'data/gd1_results4.fits'
results4 = Table.read(filename)

In [42]:
results4.info

<Table length=120756>
     name       dtype    unit                              description                            
-------------- ------- -------- ------------------------------------------------------------------
     source_id   int64          Unique source identifier (unique within a particular Data Release)
            ra float64      deg                                                    Right ascension
           dec float64      deg                                                        Declination
          pmra float64 mas / yr                         Proper motion in right ascension direction
         pmdec float64 mas / yr                             Proper motion in declination direction
      parallax float64      mas                                                           Parallax
parallax_error float64      mas                                         Standard error of parallax

## Making a function

In [43]:
def transform_rectangle(low1, high1, low2, high2):
    phi1 = [low1, low1, high1, high1] * u.deg
    phi2 = [low2, high2, high2, low2] * u.deg
    corners = gc.GD1Koposov10(phi1=phi1, phi2=phi2)
    corners_icrs = corners.transform_to(coord.ICRS)
    return corners_icrs

In [44]:
corners_icrs = transform_rectangle(-55, -45, -4, 6)

In [45]:
assert(corners_icrs[0].ra.degree == 142.7716385024318)

In [46]:
point_base = "{point.ra.degree}, {point.dec.degree}"

t = [point_base.format(point=point)
     for point in corners_icrs]
print(t)

['142.7716385024318, 21.546353324354072', '133.5042010718342, 26.902060612630827', '139.5603146332043, 35.39626209390598', '149.26169474640096, 29.63037759884082']


In [47]:
poly_base = "POLYGON({point_list})"

point_list = ', '.join(t)
poly_base.format(point_list=point_list)

'POLYGON(142.7716385024318, 21.546353324354072, 133.5042010718342, 26.902060612630827, 139.5603146332043, 35.39626209390598, 149.26169474640096, 29.63037759884082)'

In [48]:
def make_adql_polygon(coords):
    """
    """
    point_base = "{point.ra.degree}, {point.dec.degree}"

    t = [point_base.format(point=point)
         for point in coords]

    poly_base = "POLYGON({point_list})"
    point_list = ', '.join(t)
    return poly_base.format(point_list=point_list)

In [49]:
polygon1 = make_adql_polygon(corners_icrs)

In [50]:
corners_icrs = transform_rectangle(-55, -45, -5, 5)
polygon2 = make_adql_polygon(corners_icrs)
polygon2

'POLYGON(143.65740785846373, 20.98189112798802, 134.46717444171475, 26.39291776724364, 140.58825494277238, 34.85481376928442, 150.16628417989443, 29.015570791894923)'