<a id="top"></a>

# Downloading Transiting Planet Data

Outstanding thoughts:
- restricted to transiting planets? See if this works for tau boo.
- do we need all methods?
- what exercises would be best?
- any other resources needed?

***

# Learning Goals

By the end of this tutorial, you will:

- Understand how MAST makes its transiting exoplanet time-series data accessible.
- Be able to download MAST-hosted data for specific exoplanets.
- Become familiar with the exo.MAST, astoquery, and MAST APIs.
- Sort MAST data product metadata by attributes (e.g., year, PI).

# Introduction

A number of space-based missions — e.g., the Hubble Space Telescope (HST), the Transiting Exoplanet Survey Satellite (TESS), the Spitzer Space Telescope — have observed exoplanets and their host stars. MAST hosts the data products from these disparate sources, making it possible to aggregate heterogeneous data on a single target. In this tutorial, we will cover how to explore, sort, and MAST-hosted exoplanet data using a variety of methods.


# Imports

- *ast* (Python builtin) to safely evaluate strings.
- *sys* (Python builtin) to read our system's Python version.
- *os* (Python builtin) to create directories for downloaded data.
- *json* (Python builtin) to load results from HTTP GET requests.
- *pprint* (Python builtin) to neatly print json outputs.
- *urllib.request* (Python builtin) to submit HTTP GET requests and interact with the exo.MAST API. Todo: merge with requests?
- *numpy* to help aggregate our metadata (can probably remove this dependency).
- *requests* to make requests to MAST servers.
- *astropy* to provide data structures that organize our metadata.
- *astroquery.mast* to interact with MAST data products.

In [None]:
import ast
import sys
import os
import json
import pprint
import urllib.request
from urllib.parse import quote as urlencode

import numpy as np
import requests
from astropy.table import unique, Table
from astroquery.mast import Observations

# Method 1: exo.MAST

[exo.MAST](https://exo.mast.stsci.edu/docs/) is a web service API that is optimized for querying exoplanet data. Our first approach to downloading MAST data will use this API.

This portion of the tutorial is inspired by the [exo.MAST documentation](https://exo.mast.stsci.edu/docs/getting_started.html).

We can query exo.MAST using a planet's name and the exoplanets/identifiers table. However, the name needs to be formatted for the web.

In [None]:
planet_name = 'WASP-12 b' #todo: add citation

We need to safely encode the spaces in this URL. This is done by replacing all spaces in the same with the "%20" string.

In [None]:
planet_name_formatted = planet_name.replace(' ', '%20')

request_name_string = f'exoplanets/identifiers/?name={planet_name_formatted}'




Now, we append our ``request_string`` to the URL that points to the exo.MAST API, forming a complete request.

In [None]:
request_name_url = "https://exo.mast.stsci.edu/api/v0.1/" + request_name_string
print(request_name_url)


With our URL assembled, we can now make the HTTPS request using the ``urllib`` package.

In [None]:
names = urllib.request.urlopen(request_name_url).read()
names

The ``b`` before the printed dictionary indicates that these data are currently represented as bytes. To represent the requested data as a string, we need to decode the bytes with the UTF-8 encoding.

In [None]:
dict_str = names.decode("UTF-8")

Now that we've converted the returned bytes into a string, we'd like to evaluate them to case them into a more useful data type: a ``dict``.

However, we need to ensure that the evaluated string is Pythonic to evaluate it. Note that some keys — e.g., ``"keplerID"`` — have values of "null." As a next step, we therefore replace all instances of "null" with "None", which will evaluate to a Python ``None``.

In [None]:
# null is not Pythonic
dict_str = dict_str.replace('null', 'None')
dict_str


At this point, we can evaluate the string to return a ``dict``. We will use the ``ast.literal_eval`` function to do so, as it is safer to use than the ``eval`` function.

In [None]:
name_matches = ast.literal_eval(dict_str)

In [None]:
name_matches

``name_matches`` is now a ``dict`` that contains our results. A few important fields:

- ``ra`` and ``dec``: Right ascension and declination of the planet's host star.
- `planetNames`: A ``list`` of names that are used for this planet in different catalogs.

We can query exo.MAST for a list of files that are available for this planet. As before, we:

1. Format a request string
2. Submit the HTTPS request
3. Decode the request's result
4. Replace any "null"s with "None"s
5. Evaluate the string

In [None]:
# construct request string
request_spectra_string = f'spectra/{planet_name_formatted}/filelist/'
request_spectra_url = "https://exo.mast.stsci.edu/api/v0.1/" + request_spectra_string

# send the request
spectra_result = urllib.request.urlopen(request_spectra_url).read()

# decode the result and make it Pythonic
dict_str = spectra_result.decode("UTF-8")
dict_str = dict_str.replace('null', 'None')

# evaluate the result
spectra = ast.literal_eval(dict_str)

``spectra`` is now a ``dict``. Its key ``filenames`` has a value that is a ``list`` of spectra associated with the submitted planet.

In [None]:
spectra

There are two files corresponding to spectra for this planet. Next, we can download them, once again following the above steps.

In [None]:
# construct request string
filename = spectra['filenames'][0]
request_file_string = f'spectra/{planet_name_formatted}/file/{filename}'
request_file_url = "https://exo.mast.stsci.edu/api/v0.1/" + request_file_string

# send the request
spectra_result = urllib.request.urlopen(request_file_url).read()

# decode the result and make it Pythonic
downloaded_file = spectra_result.decode("UTF-8")
downloaded_file

This time, however, the decoded data cannot be evaluated as a ``dict``. Rather, it is a string of lines (separated by the new line character ``\n``). We can directly write this string to a file.

In [None]:
with open(filename, 'w') as f:
    f.write(downloaded_file)

Note, however, that not all MAST data products are accessible by this method. (why is this, exactly?)

To download other data products, we turn to other methods.

# Method 2: Using astroquery.mast.

For more MAST data products, we can make use of the astroquery.mast functionality. This approach requires an additional dependency (the [astroquery](https://astroquery.readthedocs.io/en/latest/) package).

This portion of the tutorial is inspired by the [astroquery.mast tutorial](https://astroquery.readthedocs.io/en/latest/mast/mast.html).

First, let's search for all MAST data products for the exoplanet KELT-9 b (Gaudi et al. 2017). todo: change to WASP-12b.

In [None]:
search_radius = ".02 deg"

planet_name = 'WASP-12 b'
obs_table = Observations.query_object(planet_name,radius=search_radius)
print(obs_table[:10])  

In [None]:
obs_table.columns

Let's sort these observations by the proposing PI (principal investigator) and filter out extraneous columns.

In [None]:
obs_table.sort('proposal_pi')

In [None]:
print(obs_table[['proposal_pi', 'provenance_name', 'dataproduct_type']])

If we're interested in a specific instrument, we can next see which instruments were used to observe this target.

In [None]:
print(np.unique(obs_table['provenance_name']))

Great! In just a few lines, we've collected the metadata for many observations of this target into an Astropy ``Table``. Next, let's see what data products are available for the most recent QLP observation.

In [None]:
obs_table_qlp = obs_table[obs_table['provenance_name']=='QLP']
obs_table_qlp.sort('t_min')
data_products_by_obs = Observations.get_product_list(obs_table_qlp[-1])
print(data_products_by_obs) 

There are two timeseries data products. Let's download the first one. 

In [None]:
data_products_by_obs.columns

In [None]:
obs_collection = data_products_by_obs['obs_collection'][0]
obs_id = data_products_by_obs['obs_id'][0]

single_obs = Observations.query_criteria(obs_collection=obs_collection, obs_id=obs_id)
data_products = Observations.get_product_list(single_obs)

manifest = Observations.download_products(data_products, productType="SCIENCE")

In [None]:
print(manifest)

We've now successfully downloaded MAST data using the astroquery.mast API.

# Method 3: Directly using the MAST API.
The final approach is a bit more hands-on and requires more code, but it allows for the most flexibility. Additionally, it provides the most insight into what's going on "under the hood" with the MAST requests. This approach requires the [Astropy](https://www.astropy.org/) and [NumPy](https://numpy.org/) dependencies.

This portion of the tutorial is inspired by the general [MAST API tutorial](_https://mast.stsci.edu/api/v0/MastApiTutorial.html).

In [None]:
pp = pprint.PrettyPrinter(indent=4)

In [None]:
def mast_query(request):
    """Perform a MAST query.
    
        Parameters
        ----------
        request (dictionary): The MAST request json object
        
        Returns head,content where head is the response HTTP headers, and content is the returned data"""
    
    # Base API url
    request_url='https://mast.stsci.edu/api/v0/invoke'    
    
    # Grab Python Version 
    version = ".".join(map(str, sys.version_info[:3]))

    # Create HTTP Header Variables
    headers = {"Content-type": "application/x-www-form-urlencoded",
               "Accept": "text/plain",
               "User-agent":"python-requests/"+version}

    # Encoding the request as a json string
    req_string = json.dumps(request)
    req_string = urlencode(req_string)
    
    # Perform the HTTP request
    resp = requests.post(request_url, data="request="+req_string, headers=headers)
    
    # Pull out the headers and response content
    head = resp.headers
    content = resp.content.decode('utf-8')

    return head, content

Sticking with our previous example, let's look at the planet WASP-12 b. First, we need to format our request to the MAST resolver — the service that matches requests to MAST data.

In [None]:
object_of_interest = 'WASP-12 b'

resolver_request = {'service':'Mast.Name.Lookup',
                     'params':{'input':object_of_interest,
                               'format':'json'},
                     }

# Encoding the request as a json string
req_string = json.dumps(resolver_request)
req_string = urlencode(req_string)

Next, we set the header variables needed to make the request.

In [None]:
# Grab our system's Python version for the request. 
version = ".".join(map(str, sys.version_info[:3]))

# Create HTTP Header Variables
headers = {"Content-type": "application/x-www-form-urlencoded",
           "Accept": "text/plain",
           "User-agent":"python-requests/"+version}


Now, we construct our request URL and perform our HTTP request with the ``requests`` package.

In [None]:
# Base API url
request_url='https://mast.stsci.edu/api/v0/invoke'    

# Perform the HTTP request
resp = requests.post(request_url, data="request="+req_string, headers=headers)

The response must be decoded into a string, as in the previous examples, with the UTF-8 encoding. We'll also use the ``json`` package to parse the string.

In [None]:
resolved_object_string = resp.content.decode('UTF-8')
resolved_object = json.loads(resolved_object_string)

pp.pprint(resolved_object)

Parsing apart some of the output:
- the *cached* field denotes whether this result has already been saved on this device.
- the *canonicalName* field denotes the default name of the planet.
- the *decl* (float) field denotes the declination of the resolved coordinate.
- the *ra* (float) field denotes the right ascention of the resolved coordinate.
- the *searchRadius* field denotes the raidius of the search.

See the documentation (link) for further information.

Now that we've resolved our target, let's save its coordinates as variables (as floats) — we'll need them later on.

In [None]:
obj_ra = resolved_object['resolvedCoordinate'][0]['ra']
obj_dec = resolved_object['resolvedCoordinate'][0]['decl']

obj_ra, obj_dec

With the coordinates of the object now known, we can run a *Mast.Caom.Cone* query to retrieve metadata on all MAST data around this coordinate.

In [None]:
mast_request = {'service':'Mast.Caom.Cone',
                'params':{'ra':obj_ra,
                          'dec':obj_dec,
                          'radius':0.2},
                'format':'json',
                'pagesize':2000,
                'page':1,
                'removenullcolumns':True,
                'removecache':True}


# Encoding the request as a json string
req_string = json.dumps(mast_request)
req_string = urlencode(req_string)

# Perform the HTTP request
resp = requests.post(request_url, data="request="+req_string, headers=headers)

# Decode the HTTP result
mast_data_string = resp.content.decode('UTF-8')
mast_data = json.loads(mast_data_string)


print(mast_data.keys())
print("Query status:",mast_data['status'])

Let's take a look at the first returned data entry.

In [None]:
pp.pprint(mast_data['data'][0])

There's a lot of metadata here, and it's a bit hard to understand all at once. To make things a bit more digestible, we can create an Astropy ``Table``.

In [None]:
mast_data_table = Table()

# todo: explain loop
for col,atype in [(x['name'],x['type']) for x in mast_data['fields']]:
    if atype=="string":
        atype="str"
    if atype=="boolean":
        atype="bool"
    mast_data_table[col] = np.array([x.get(col,None) for x in mast_data['data']],dtype=atype)
    
print(mast_data_table)

In [None]:
mast_data_table.sort('t_min')

With our metadata all acquired, we can now sort it based on, e.g., start date.

In [None]:
print(mast_data_table)

Let's get the most recent Spitzer data product. Because these data are sorted by time, the most recent data is the last entry, which we can access with the ``-1`` index.

In [None]:
# Picking the first Hubble Space Telescope observation
recent_index = -1
interesting_observation = mast_data_table[mast_data_table["obs_collection"] == "SPITZER_SHA"][recent_index]
print("Observation:",
      [interesting_observation[x] for x in ['dataproduct_type', 'obs_collection', 'instrument_name']])

It appears that the latest Spitzer data for this target was taken with IRAC (Infrared Array Camera).

We can search MAST using the obsid of this observation to determine how many data products are associated with this observation.

In [None]:
obsid = interesting_observation['obsid']

product_request = {'service':'Mast.Caom.Products',
                  'params':{'obsid':obsid},
                  'format':'json',
                  'pagesize':100,
                  'page':1}   

# Encoding the request as a json string
req_string = json.dumps(product_request)
req_string = urlencode(req_string)

# Perform the HTTP request
resp = requests.post(request_url, data="request="+req_string, headers=headers)

# Decode the HTTP result
obs_products_string = resp.content.decode('UTF-8')
obs_products = json.loads(obs_products_string)

print("Number of data products:", len(obs_products["data"]))
print("Product information column names:")
pp.pprint(obs_products['fields'])

We can also see what *types* these data products are.

In [None]:
pp.pprint([x.get('productType',"") for x in obs_products["data"]])

We can place these results in a table as well, restricting ourselves to the science products.

In [None]:
sci_prod_arr = [x for x in obs_products['data'] if x.get("productType", None) == 'SCIENCE']
science_products = Table()

for col, atype in [(x['name'], x['type']) for x in obs_products['fields']]:
    if atype=="string":
        atype="str"
    if atype=="boolean":
        atype="bool"
    if atype == "int":
        atype = "float" # array may contain nan values, and they do not exist in numpy integer arrays
    science_products[col] = np.array([x.get(col,None) for x in sci_prod_arr],dtype=atype)

print("Number of science products:",len(science_products))
print(science_products)

Next, let's download these data products using the [requests](https://requests.readthedocs.io/en/latest/) package.

In [None]:
download_url = 'https://mast.stsci.edu/api/v0.1/Download/file?'

for row in science_products:     

    # Make file path
    out_path = os.path.join("mastFiles", row['obs_collection'], row['obs_id'])
    if not os.path.exists(out_path):
        os.makedirs(out_path)
    out_path = os.path.join(out_path, os.path.basename(row['productFilename']))
        
    # Download the data
    payload = {"uri":row['dataURI']}
    resp = requests.get(download_url, params=payload)
    
    # Save to file
    with open(out_path,'wb') as f:
        f.write(resp.content)
        
    # Check for file 
    if not os.path.isfile(out_path):
        print("ERROR: " + out_path + " failed to download.")
    else:
        print("COMPLETE: ", out_path)

You can check that these data files have been downloaded correctly by checking the directory that the `out_path`s are downloaded to.

In [None]:
ls mastFiles/SPITZER_SHA/000001AF1000/

Great! We've successfully downloaded the data for this planet via MAST.

# Exercises
- Repeat this for Spitzer?
- Query a light curve?
- Read in and plot the data? Compare to a downloaded plot?

# Additional Resources
- An [introduction to HTTP GET requests](https://www.ibm.com/docs/en/cics-ts/5.3?topic=protocol-http-requests).
- Primers on exoplanet spectral data types ([Deming, Louie, and Sheets 2018](https://iopscience.iop.org/article/10.1088/1538-3873/aae5c5/meta?casa_token=253HfRr4kyYAAAAA:C0CtfuH4Um2l4Kul5O3tajY2TolSVuXi8fGj48bzSlmJIuvPmeYkb1yXtd10MOjwPqJokDpNvv4)) and on lightcurves ([Winn 2010](https://books.google.com/books?hl=en&lr=&id=VlSVmxgPgGYC&oi=fnd&pg=PA55&dq=exoplanet+transits&ots=-sl8U--Mws&sig=lvBd93ioa2YDxehtpxSx6nL82UA#v=onepage&q=exoplanet%20transits&f=false))
- Information on the [TESS Mission](https://tess.mit.edu/).
- A [neat blog post](https://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html) on why `literal_eval` is preferred to `eval`.
- [Prettyprint documentation](https://docs.python.org/3/library/pprint.html)

- [Gaudi, B. Scott, et al. "A giant planet undergoing extreme-ultraviolet irradiation by its hot massive-star host." Nature 546.7659 (2017): 514-518.](https://www.nature.com/articles/nature22392)
- [Bakos, G. Á., et al. "HAT-P-11b: A super-Neptune planet transiting a bright K star in the Kepler field." The Astrophysical Journal 710.2 (2010): 1724.](https://iopscience.iop.org/article/10.1088/0004-637X/710/2/1724/meta)
- [exo.MAST tutorial](https://exo.mast.stsci.edu/docs/getting_started.html#resolving-exoplanets)
- [astroquery.MAST documentation](https://astroquery.readthedocs.io/en/latest/mast/mast.html)
- [MAST API documentation](https://mast.stsci.edu/api/v0/MastApiTutorial.html)

# About this Notebook

**Author**: Arjun B. Savel (asavel@umd.edu).

**Last updated**: 2022-06-12

# Citations
If you use `astropy`, `astroquery`, or `numpy` for published research, please cite the
authors. Follow these links for more information about citing `astropy`,
`astroquery`, and `numpy`:

* [Citing `astropy`](https://www.astropy.org/acknowledging.html)
* [Citing `astroquery`](https://astroquery.readthedocs.io/en/latest/#astroquery)
* [Citing `numpy`](https://numpy.org/citing-numpy/)


***

[Top of Page](#top)<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/>
