In [19]:
#-------- eJWST ----------------#
from astroquery.esa.jwst import Jwst

#------------- Gaia ---------#
from astroquery.gaia import Gaia

#---------- NED -----------_#
from astroquery.ipac.ned import Ned

from astropy.coordinates import SkyCoord
from astropy import coordinates as coords
import astropy.units as u
import numpy as np
import pandas as pd


# Information

## Quick Description of astroquery
**astroquery**(https://astroquery.readthedocs.io/en/latest/) is a Python library designed to simplify querying astronomical databases and archives. 
It provides a unified interface to access and retrieve data from a variety of sources, making it easier to interact with astronomical data without needing to deal with each service's specific APIs directly.\
Astroquery is organized into modules, each corresponding to a specific astronomical archive or service. Each module typically contains a class that handles the queries and interactions with that particular service.
Each module provides methods to perform queries, such as query_object for querying by object name, query_region for querying by celestial coordinates, and query_adql for querying using ADQL (where supported).

**ADQL** (Astronomical Data Query Language) is a SQL-like language that allows users to perform complex queries and data retrieval based on astronomical coordinates and other criteria. \
To understand what can be retrieved using ADQL typically involves understanding the schema and capabilities of the astronomical database or archive you are querying. This information usually can be found in the Archive's Documentation or Web Interface. 
Some even provide web-based ADQL query builders that allow you to construct queries interactively (e.g eJWST).

Services Supporting ADQL:
- ESA Archives: Some archives, such as Gaia and eJWST, support ADQL queries. 
    - Gaia Archive: Provides access to data from the Gaia mission, offering high-precision astrometry and photometry. **astroquery Package: astroquery.gaia**
    - eJWST: Contains data from the James Webb Space Telescope, offering observations in the infrared spectrum. **astroquery Package: astroquery.esa.jwst**
    - XMM-Newton Archive:  Contains data from the XMM-Newton observatory, which observes the X-ray sky. **astroquery Package: astroquery.xsa**
- SIMBAD: Provides detailed information about astronomical objects, including their identifiers, positions, classifications, and bibliographic references. **astroquery Package: astroquery.simbad**

Services Without ADQL Support:
- NED (NASA/IPAC Extragalactic Database): Does not natively support ADQL queries. Instead, NED offers its own query interfaces and tools for accessing information about extragalactic objects. **astroquery Package: astroquery.ipac.ned** 
- MAST (The Mikulski Archive for Space Telescopes): Collects and archives a variety of scientific data to support the astronomical community.\
 MAST offers single mission-based queries as well as cross-mission queries.  **astroquery Package: astroquery.mast**

In the following I provide example queries using ADQL for Gaia and eJWST, as well as simple queries for the NED database.


## Authenticated access


Authenticated access via a COSMOS account offers the following benefits:

- **Persistent Results**: Results are saved in a private user area on the server. This means that once a job is completed, you can access the results at any time without re-running the query, which is particularly useful for large datasets. Retrieving finished jobs is faster than running a new query, saving time and resources.

- **Increased Capacity**: Authenticated access provides higher quotas and more capacity for handling larger queries. This helps prevent query timeouts and allows you to work efficiently with extensive data.

To obtain authenticated access, you need to create a COSMOS account, which is free and straightforward (see for example: https://www.cosmos.esa.int/web/jwst-archive/registration). COSMOS is the European Space Agency's (ESA) system for managing access to their archives, including services like Gaia and JWST.


In [25]:
#---------- Login with cosmos account -----------#
Jwst.login(user='Name', password='Password!') # change to your cosmos credentials
Gaia.login(user='Name', password='Password')

INFO: OK [astroquery.utils.tap.core]
INFO: Login to gaia TAP server [astroquery.gaia.core]
INFO: OK [astroquery.utils.tap.core]
INFO: Login to gaia data server [astroquery.gaia.core]
INFO: OK [astroquery.utils.tap.core]


## Example ADQL query JWST

ADQL queries start with selecting the columns that will be in the output. Usually,
the column name is sufficient. If there is a need to lift ambiguity, add the table
name first (table_name.column_name). This is also where the number of rows is fixed
(here 100).

For the JWST query I access two tables jwst.observation and jwst.archive choosing the first 500 entries from both tables. Then I retrieve the specififc job_id for both, which can be used later to retrieve the results of the query without having to run the job again. I transform the result to a pandas dataframe and merge them to one dataframe using the observationid column. Here one can see that the first 500 entries are different for both tables, likely due to jwst.archive storing all calibration levels, while jwst.observations does not.

In [39]:
#---------- ADQL queries -----------#
# using jwst.observation
observation = "SELECT TOP 500 observationid, target_name, target_keywords, instrument_keywords, proposal_id FROM jwst.observation WHERE (target_keywords is not null)"
job_observation = Jwst.launch_job(observation, async_job=True)
result_observation = job_observation.get_results()

# using jwst.archive
archive = "SELECT TOP 500 observationid, calibrationlevel, instrument_name, public, dataproducttype, targetposition_coordinates_cval1, targetposition_coordinates_cval2 FROM jwst.archive WHERE ((jwst.archive.calibrationlevel = 3) OR (jwst.archive.calibrationlevel = -1))" 
job_archive = Jwst.launch_job(archive, async_job=True)
result_archive = job_archive.get_results()

INFO: Query finished. [astroquery.utils.tap.core]
INFO: Query finished. [astroquery.utils.tap.core]


In [40]:
#----------- extract job ids to later retrieve without loading ADQL again ------------#
job_id_observation = job_observation.jobid
job_id_archive = job_archive.jobid
print(job_id_observation, job_id_archive)

# to retrieve the jobs again without having to execute the queries again
# old_job = Jwst.load_async_job(jobid=job_id) # for job_id put here the job id from the job you want to retrieve the data 
# result = old_job.get_results()

1723116522259O 1723116523183O


In [41]:
#------ put data into dataframe --------#
pandas_observation = result_observation.to_pandas()
pandas_archive = result_archive.to_pandas()

In [42]:
jwst_df = pd.merge(pandas_observation, pandas_archive, on='observationid', how='inner')
jwst_df

Unnamed: 0,observationid,target_name,target_keywords,instrument_keywords,proposal_id,calibrationlevel,instrument_name,public,dataproducttype,targetposition_coordinates_cval1,targetposition_coordinates_cval2
0,jw04291-o002_s00413_nirspec_f170lp-g235h,Obs2_1080,SREGION=POLYGON ICRS 214.859171947 52.8369553...,APERTURE=NRS_FULL_MSA|DETECTOR=MULTIPLE|EXPEND...,4291,3,NIRSPEC/MSA,True,spectrum,214.859225,52.837084
1,jw02959-o002_t002_nirspec_g395h-f290lp,06355,SREGION=POLYGON ICRS 110.842197620 -73.435659...,APERTURE=NRS_FULL_IFU|DETECTOR=MULTIPLE|EXPEND...,2959,3,NIRSPEC/IFU,False,cube,110.844536,-73.435076


## Example ADQL query GAIA

Here I only perform one query for the gaiadr3.gaia_source and transform the query result into a pandas datframe.

In [32]:
#---------- ADQL queries -----------#
# using gaiadr3.gaia_source
gaia = "SELECT TOP 100 gaia.source_id, gaia.ra, gaia.dec, gaia.pmra, gaia.classprob_dsc_combmod_quasar FROM  gaiadr3.gaia_source AS gaia"
job_gaia = Gaia.launch_job_async(gaia)
result_gaia = job_gaia.get_results()

INFO: Query finished. [astroquery.utils.tap.core]


In [33]:
pandas_gaia = result_gaia.to_pandas()
pandas_gaia

Unnamed: 0,SOURCE_ID,ra,dec,pmra,classprob_dsc_combmod_quasar
0,5991615270691840512,242.442828,-43.547235,-5.527784,1.025633e-13
1,5991615270691849984,242.441421,-43.539854,-5.437843,1.022722e-13
2,5991615270693763072,242.436429,-43.545056,-2.459832,1.078649e-13
3,5991615270693763968,242.443102,-43.549471,3.587971,1.029555e-13
4,5991615270693764096,242.445660,-43.546384,-2.853520,1.888917e-11
...,...,...,...,...,...
95,4301016332250115328,294.944448,7.722656,,1.430440e-03
96,4301016332250121600,294.943810,7.726852,,1.019537e-04
97,4301016332251438848,294.948308,7.733534,,1.487106e-03
98,4301016332260870400,294.949012,7.720361,-2.000960,1.022203e-13


## Example Query NED

NED does not support ADQL queries, instead it offers different modules to extract data. Images and Spectra queries return their results as a list of HDUList, while everything else is returned as a Table. Here I display a query by an object Name and one by coordinates.

In [17]:
#-------------------- Query by object Name --------------------#
m87 = Ned.query_object("Messier 87")
m87

No.,Object Name,RA,DEC,Type,Velocity,Redshift,Redshift Flag,Magnitude and Filter,Separation,References,Notes,Photometry Points,Positions,Redshift Points,Diameter Points,Associations
Unnamed: 0_level_1,Unnamed: 1_level_1,degrees,degrees,Unnamed: 4_level_1,km / s,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,arcmin,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
int32,str30,float64,float64,object,float64,float64,object,object,float64,int32,int32,int32,int32,int32,int32,int32
1,MESSIER 087,187.70593,12.39112,G,1284.0,0.004283,UUN,9.59,--,3364,55,506,122,77,14,1


In [20]:
#-------------------- Query by coordinates --------------------#
coordinates = coords.SkyCoord(ra=12.5138*u.deg, dec=12.3911*u.deg, frame='icrs')

#-------------------- Query NED for a region around the given coordinates --------------------#
result_coord = Ned.query_region(coordinates, radius=0.1 * u.deg)  # 0.1 degree radius
result_coord

No.,Object Name,RA,DEC,Type,Velocity,Redshift,Redshift Flag,Magnitude and Filter,Separation,References,Notes,Photometry Points,Positions,Redshift Points,Diameter Points,Associations
Unnamed: 0_level_1,Unnamed: 1_level_1,degrees,degrees,Unnamed: 4_level_1,km / s,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,arcmin,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
int32,str30,float64,float64,object,float64,float64,object,object,float64,int32,int32,int32,int32,int32,int32,int32
1,WISEA J004938.88+122302.0,12.41202,12.38391,IrS,--,--,,,5.981,0,0,12,1,0,0,0
2,WISEA J004939.62+122341.6,12.41512,12.39489,IrS,--,--,,,5.788,0,0,12,1,0,0,0
3,WISEA J004940.16+122231.1,12.41736,12.37532,IrS,--,--,,,5.731,0,0,12,1,0,0,0
4,WISEA J004940.25+122336.5,12.41774,12.3935,IrS,--,--,,,5.632,0,0,12,1,0,0,0
5,WISEA J004940.37+122121.2,12.41822,12.35591,IrS,--,--,,,5.987,0,0,12,1,0,0,0
6,WISEA J004940.92+122332.2,12.42053,12.3923,IrS,--,--,,,5.467,0,0,12,1,0,0,0
7,WISEA J004941.05+122225.9,12.42107,12.37387,IrS,--,--,,,5.532,0,0,12,1,0,0,0
8,WISEA J004941.83+122352.8,12.42437,12.39791,IrS,--,--,,,5.257,0,0,18,2,0,0,0
9,WISEA J004941.86+122210.1,12.42444,12.36949,IrS,--,--,,,5.396,0,0,12,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
