# **MAST Data Bulk Download through AWS**
This notebook seeks to enhance the **Mikulski Archive for Space Telescopes** (**MAST**) user experience for astronomers and scientists such that they can download mission specific data seamlessly! This notebook will help any astronomers with downloading data from two specific missions: `GALEX` and `Pan-STARRS (PS1)`.

To give some more context:
* [Galaxy Evolution Explorer (GALEX)](https://archive.stsci.edu/missions-and-data/galex) was a space telescope managed primarily by JPL/CalTech that was launched in 2003. GALEX was launched on an Orbital Sciences Corporation (defunct) Pegasus launch vehicle, which one can view at the Smithsonian in Sterling, VA. It operated in the ultraviolet wavelength and was used to study galaxys' star formation from the early universe to the present. GALEX was decommissioned in 2013 and it's data is archived at MAST.

* [Panoramic Survey Telescope and Rapid Response System (Pan-STARRS)](https://outerspace.stsci.edu/display/PANSTARRS/) is located at Haleakala Oberservatory in Hawaii, US that surveys the sky for moving objects. Pan-STARRS was originally a collaboration between various academic institutions and the U.S. Air Force, given that the work has both scientific and defense/national security implications. The first Pan-STARRS telescope (PS1) has two data releases: DR1 and DR2 that we host here at MAST. DR2 is considered one of the largest astronomical datasets ever.

# Learning Goals
By using this notebook, an astronomer or scientist will:
* Understand that downloading data and files in bulk from AWS is feasible and possibly easier than going through the MAST portal.
* Make targeted queries to MAST using parameters such as: `right ascension`, `declination`, `observation` and more.
* Filter the resulting products by using parameters such as: `productType`, `productSubGroupDescription`, `productGroupDescription`, `mrp_only`, and more.
* Use this notebook to programmatically download *.fits* files locally to their computer. Once the *.fits* files are downloaded, the researcher can experiment, prototype, etc. much easier.

# Table of Contents
* [Introduction](#Introduction)
* [Using Observations and the Common Archive Observation Model (CAOM)](#using-observations-and-the-common-archive-observation-model-caom)
* [Two Core Functions from Astropy: `query_criteria()` and `filter_products()`](#two-core-functions-from-astropy-query_criteria-and-filter_products)
* [The 3-Step Data Download Process](#the-3-step-data-download-process)

# Introduction
This notebook contains some sample code to bulk download files from MAST, with examples provided for `GALEX` and  `Pan-STARRS (PS1)`. This notebook can be generalized to query data from other missions too such as: `SWIFT`, `HST`, or `IUE`. Please feel free to modify the code to your particular use case! If you have any questions, please don't hesitate to reach out to archive@stsci.edu.

Other links that maybe useful:
- [MAST Homepage](https://archive.stsci.edu/)
- [MAST Notebook Repository](https://spacetelescope.github.io/mast_notebooks/intro.html)
- [GALEX Homepage](https://galex.stsci.edu/GR6/)
- [Pan-STARRS Homepage](https://archive.stsci.edu/panstarrs/)


# Using **Observations** and the [Common Archive Observation Model (CAOM)](https://mast.stsci.edu/vo-tap/api/v0.1/caom/)
* The `Observations` API from *astroquery.mast* can be used to query MAST data, specifically it allows you to query data from CAOM. CAOM is a observational database that houses metadata from multiple missions at the same time, from legacy missions to currently operational missions.

* There is also another `MastMissions` API from *astroquery.mast* that can be used to query MAST data. This API is more limited in scope and only supports querying data from the missions: `HST`, `JWST`, `CLASSY`, and `ULLYSES`. This notebook will not demonstrate the capabilities of the `MastMissions` API. Rather, please refer to the excellent notebook: [**Searching for Mission-Specific Data with Astroquery**](https://spacetelescope.github.io/mast_notebooks/notebooks/multi_mission/missions_mast_search/missions_mast_search.html) - Sam Bianco. 

In [13]:
from astroquery.mast import Observations

# Turning on access to the cloud dataset
Observations.enable_cloud_dataset()

INFO: Using the S3 STScI public dataset [astroquery.mast.cloud]


# Two Core Functions from Astropy: `query_criteria()` and `filter_products()`

`query_criteria()` and `filter_products()` are two functions from Astropy that enable us to make queries and then filter the corresponding products.

All the parameters that we could use in `query_criteria()` are shown below.

In [14]:
Observations.get_metadata('observations')['Column Name'].pprint(max_lines=-1)

# NOTE: Comment the line above and un-comment the line below if you want more details about the parameters with examples.
# Observations.get_metadata("observations").pprint(max_lines=-1, max_width=-1)


     Column Name     
---------------------
           intentType
       obs_collection
      provenance_name
      instrument_name
              project
              filters
    wavelength_region
          target_name
target_classification
               obs_id
                 s_ra
                s_dec
          proposal_id
          proposal_pi
            obs_title
     dataproduct_type
          calib_level
                t_min
                t_max
        t_obs_release
            t_exptime
               em_min
               em_max
                objID
             s_region
              jpegURL
             distance
                obsid
           dataRights
               mtFlag
               srcDen
              dataURL
        proposal_type
      sequence_number


All the filters that we could filter by in `filter_products()` is located on the **[MAST API](https://masttest.stsci.edu/api/v0/_productsfields.html)**.

# The Three-Step Data Download Process

Getting the data is seamless in the three-three steop process outlined below.

* **STEP 1**: Get the products after making a specific query.
* **STEP 2**: Filter the products based on specific parameters.
* **STEP 3**: Download the files locally via Python.

**STEP 1**: When filtering an observation using the function `query_criteria()`, you must specify two coordinates for the right ascension and two coordinates for the declination. This forms a box to limit the search area. You must also supply a mission that you would want to search from such as 'GALEX' or 'PS1'.

If you would like to filter by other parameters, see the other filter parameters above. Please modify this code for your specific use case!

In [18]:
#    - Ex.: s_ra: 30.2,31.2
#           s_dec: -10.25,-9.25
#           obs_collection: GALEX, PS1

obs = Observations.query_criteria(s_ra=[30.2,31.2], s_dec=[-10.25,-9.25], obs_collection="GALEX")
prod = Observations.get_product_list(obs)
len(prod)


2291

**STEP 2**: Now we can use `filter_products()` to select specific products. Right now, this code is configured such that you can filter based on *productType*, *productSubGroupDescription*, *productGroupDescription*, and *mrp_only*. The valid filter parameters for GALEX and Pan-STARRS are outlined below as examples. Please use only these parameters + corresponding values, unless you see another parameter in the documentation (see above) that you would like to use. Please use the right filter products for your specific mission by referring to the documentation (see above)!

**GALEX Example**
* productType: *AUXILIARY*, *CATALOG*, *INFO*, *PREVIEW*, *SCIENCE*, *THUMBNAIL*
* productSubGroupDescription: *Catalog Only*, *Imaging Only*, *Spectra Only*, *Spectral Image Strips Only*, *Whole Field Images Only*
* productGroupDescription: *Minimum Recommended Products*
* mrp_only: *True*, *False*.

**Pan-STARRS (PS1) Example**
* productType: *AUXILIARY*, *CATALOG*, *INFO*, *SCIENCE*
* productSubGroupDescription: - 
* productGroupDescription: *Minimum Recommended Products*
* mrp_only: *True*, *False*

Note that *productSubGroupDescription* and *productGroupDescription* may not be needed when filtering for Pan-STARRS products. An example for 'GALEX' is provided below as well as an example for PS1. Please modify this code for your specific use case!


In [19]:
#    - Ex. (GALEX): productType: SCIENCE
#           productSubGroupDescription: Imaging Only
#           productGroupDescription: Minimum Recommended Products
#           mrp_only: True

#    - Ex. (PS1): productType: <skip>
#                 productSubGroupDescription: <skip>
#                 productGroupDescription: <skip>
#                 mrp_only: True

# Use this for the 'GALEX' example.
filt_prod = Observations.filter_products(
    prod,
    productType="SCIENCE",
    productSubGroupDescription="Imaging Only",
    productGroupDescription="Minimum Recommended Products",
    mrp_only=True
)

# Shows how many files are left after applying the filter.
display(len(filt_prod))

# Shows the first 5 files from the filtered table.
display(filt_prod[0:5])

7

obsID,obs_collection,dataproduct_type,obs_id,description,type,dataURI,productType,productGroupDescription,productSubGroupDescription,productDocumentationURL,project,prvversion,proposal_id,productFilename,size,parent_obsid,dataRights,calib_level,filters
str6,str5,str8,str19,str139,str1,str166,str9,str28,str56,str1,str3,str1,str3,str68,int64,str6,str6,int64,str3
665,GALEX,image,2436590472420917248,Intensity map (J2000),C,mast:GALEX/url/data/GR6/pipe/01-vsn/03716-MISDR1_18032_0666/d/01-main/0001-img/07-try/MISDR1_18032_0666-fd-int.fits.gz,SCIENCE,Minimum Recommended Products,Imaging Only,--,MIS,--,--,MISDR1_18032_0666-fd-int.fits.gz,9559896,665,PUBLIC,2,FUV
665,GALEX,image,2436590472420917248,Intensity map (J2000),C,mast:GALEX/url/data/GR6/pipe/01-vsn/03716-MISDR1_18032_0666/d/01-main/0001-img/07-try/MISDR1_18032_0666-nd-int.fits.gz,SCIENCE,Minimum Recommended Products,Imaging Only,--,MIS,--,--,MISDR1_18032_0666-nd-int.fits.gz,17603914,665,PUBLIC,2,NUV
4923,GALEX,image,3209978155506860032,Intensity map (J2000),C,mast:GALEX/url/data/GR7/pipe/01-vsn/25697-GI5_028097_W1_18085_0274/d/01-main/0007-img/07-try/GI5_028097_W1_18085_0274-nd-int.fits.gz,SCIENCE,Minimum Recommended Products,Imaging Only,--,GII,--,177,GI5_028097_W1_18085_0274-nd-int.fits.gz,17139488,4923,PUBLIC,2,NUV
29153,GALEX,image,6380521092288610304,Intensity map (J2000),C,mast:GALEX/url/data/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg03-fd-int.fits.gz,SCIENCE,Minimum Recommended Products,Imaging Only,--,AIS,--,--,AIS_273_sg03-fd-int.fits.gz,1455226,29153,PUBLIC,2,FUV
29153,GALEX,image,6380521092288610304,Intensity map (J2000),C,mast:GALEX/url/data/GR6/pipe/02-vsn/50273-AIS_273/d/01-main/0001-img/07-try/AIS_273_sg03-nd-int.fits.gz,SCIENCE,Minimum Recommended Products,Imaging Only,--,AIS,--,--,AIS_273_sg03-nd-int.fits.gz,7683367,29153,PUBLIC,2,NUV


**STEP 3**: Download the files to your local computer. The line below will download the first five files only. Please modify this code for your specific use case, especially if you need to download more than five files! Thank you for going through this notebook and please reach out of you have any questions!

In [None]:
Observations.download_products(filt_prod[0:5], cloud_only=True)

# About this Notebook

* **Authors**: Yingquan Li, Bernie Shao
* **Keywords**: GALEX, Pan-STARRS, Bulk Download, Python, AWS
* **Updated On**: 2024-11-08
* **References**: [Missions Mast Search (Sam Bianco)](https://github.com/spacetelescope/mast_notebooks/blob/main/notebooks/multi_mission/missions_mast_search/missions_mast_search.ipynb)

For support, please contact the Archive HelpDesk at archive@stsci.edu.