# Find Existing and Planned JWST Observations Using _astroquery_

## Introduction

As with _HST, JWST_ observers are not allowed to propose observations that duplicate existing, planned, or approved observations unless they provide a scientific justification in their proposal and that request is approved. Consult the [JWST Duplicate Observation Policy](https://jwst-docs.stsci.edu/jwst-opportunities-and-policies/jwst-general-science-policies/jwst-duplicate-observations-policy) for details. Broadly speaking, observations might duplicate if they are obtained with the same scientific instrument (or a different instrument with similar configurations and capabilities), and two or more of the following apply:
  * Same astrophysical source, or significant spatial overlap of fields
  * Similar imaging passband, or overlapping spectral range
  * Similar (spectral) resolution
  * Similar exposure depth

This notebook illustrates how to use the python package [astroquery](https://astroquery.readthedocs.io/en/latest/mast/mast.html) to search the Mikulski Archive for Space Telescopes (MAST) for potential duplicate observations. Proposers may also use the [MAST Portal](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html) to search the archive, but that may be less efficient for large numbers of targets. 

<ul>
    <li><a href="#Setup">Setup</a></li>
    <li><a href="#Example-Queries">Example Queries</a></li>
    <ul>
        <li><a href="#Target-by-Name">Single Target by Name</a></li>
        <li><a href="#Moving-Target">Single Moving Target</a></li>
        <li><a href="#Target-Field">Target Field by Position</a></li>
        <li><a href="#Target-List">Search a Target List</a></li>
        <li><a href="#upload-targets">Loading Targets from a File</a></li>
    </ul>
    <li><a href="#Resources">Additional Resources</a></li>
</ul>

### Special Disclaimer
<a id="Disclaimer"></a>

The capabilities described here will help <em>identify</em> potential duplications between your intended JWST observations and those that have been approved, planned, or that have already executed. 

<div class="alert alert-block alert-info">

<span style="color:black">
The complete footprint of approved (but not executed) dithered or mosaicked observations is only approximate. That is, only the primary location is reported for an observation, but not necessarily those for associated dither positions or mosaic tiles. Moreover metadata in MAST about planned/approved observations is <b>not sufficient</b> to determine precisely whether your intended observation is a genuine duplication, particularly for slit or MOS spectroscopy. You are responsible for evaluating the details of the planned observations by using the accepted program's APT file (and/or the Aladin display in APT, as appropriate) to determine if the potential duplications are genuine.
</span>
</div>


<a id="Setup"></a>
## Setup

We begin by importing some essential python packages: general utilities in [astropy](https://www.astropy.org/), the [pandas](https://pandas.pydata.org/) data manipulation library, and query services in astroquery. We also define a utility routine to create URLs to the parent programs of matching observations. 

In [None]:
import astropy
import pandas as pd
from astropy import table
from astropy import units as u
from astropy.coordinates import Angle
from astroquery.mast import Mast
from astroquery.mast import Observations

# Give the notebook cells more of the available width
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:99% !important; }</style>"))

APT_LINK = 'http://www.stsci.edu/cgi-bin/get-proposal-info?id={}&observatory=JWST'

def get_program_URL(program_id):
    """
    Generate the URL for program status information, given a program ID. 
    """
    return APT_LINK.format(program_id)

The results of an astroquery search are contained in an [astropy table](https://docs.astropy.org/en/stable/table/). There are multiple ways to display the results; the function below displays table fields that are most relevant for identifying potential duplications of JWST observations, and should be treated as illustrative. 

In [None]:
def display_results(obs):
    """
    Simple display of results related to identifying potentially duplicating targets.
    Observation program title is truncated for presentation in this notebook
    """
    # build the URL to the JWST programs.
    obs['proposal_URL'] = [get_program_URL(x) for x in obs['proposal_id']]
    obs['obs_title'] = [x[:70] for x in obs['obs_title']]
    obs['obs_title'].info.format = '<'
    obs['target_name', 'instrument_name', 'filters', 'dataproduct_type', 't_exptime', 
        'proposal_id'].pprint(max_lines=40, max_width=90)
    
    print("\nUnique Program Titles:")
    table.unique(obs, keys=['proposal_id'])['proposal_id','obs_title'].pprint(max_width=100)
    print("\nUnique URLs to status of existing programs:")
    for i in set(obs['proposal_URL']):
        print(i)

<a id="Example-Queries"></a>
## Example Queries

All of the queries below search for JWST observations, using a search radius somewhat larger than fields of view (FoV) of interest, to allow for the possibility that the FoV may be rotated when approved-but-unexecuted observations are actually scheduled. If your intended observation uses a different FoV, then adjust the search radius accordingly. 

<a id="Target-by-Name"></a>
### Single Target by Name

This example shows how to query for a single target with a standard name, **Trappist-1** which is a star with a known exo-planet. The intended observations would be timeseries imaging in a small FoV. Note that the name will be resolved automatically to coordinates in this case. We use the <b><code>query_criteria()</code></b> method to limit the search to JWST observations. 

In [None]:
obs = Observations.query_criteria(
        objectname="Trappist-1", 
        radius="10s", 
        obs_collection="JWST"
        )
print('Number of matching observations: {}'.format(len(obs)))

Examine the returned table columns most relevant for identifying potential duplications. Note: it is still up to you to determine if these observations count as a duplicate with those you were planning. For instance, it does not provide the timing information necessary to determine which TRAPPIST-1 planet they are targetting. In some cases, the target name or proposal title (<code>obs_title</code>) contains this information.

In [None]:
display_results(obs)

<a id="Moving-Target"></a>
### Single Moving Target

This example shows how to query for a moving target. This kind of search is limited to a modest set of solar system bodies with recognized names. Note the use of a wildcard character (*) in case the target name includes other text.

In [None]:
obs = Observations.query_criteria(
        target_name="Io*",
        obs_collection="JWST"
        )
    
display_results(obs)

<a id="Target-Field"></a>
### Target Field by Position

This example shows how to search an area of sky for overlap with a proposed deep field. The field center (RA, Dec) is (12:12:22.513, +27:34:13.88), and the planned survey area is 30&times;30 arcmin. We will limit the search to JWST imaging observations. First, convert the coordinate representationn to degrees, then execute the search.

In [None]:
ra_deg = Angle('12:12:22.513 hours').degree
dec_deg = Angle('+27:34:13.88 degree').degree
obs = Observations.query_criteria(
        s_ra=[ra_deg-0.25,ra_deg+0.25],
        s_dec=[dec_deg-0.25,dec_deg+0.25],
        dataproduct_type="image",
        obs_collection="JWST"
        )

There is clearly an overlap with another program, but only in certain filters:

In [None]:
display_results(obs)

<a id="Target-List"></a>
### Search for Observations of Targets in a List

It may be best to search for individual targets (as above) with the [MAST Portal](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html) because the results are easily visualized. But it may be more efficient to search for a large list of targets using astroquery. 

Your list might be stored in a file on your local system, and consist of coordinates and custom search radii. But for simplicity the list in this example consists of standard target names, constructed in code. Not all of the targets have approved or existing JWST observations, so the first step is to determine the number of observations for each target using the astroquery method <b><code>Observations.query_criteria_count()</code></b>.

<div class="alert alert-block alert-info">

<span style="color:black">
    It is good practice to first check the number of matching observations before fetching the results themselves, in case the number of results is extremely large. This is more important when querying large MAST missions, such as <i>HST</i>. Note that even for a modest number of results this query may take several seconds.
    
</span>
</div>

In [None]:
# Create a dictionary to contain the number of observations for each target
targets = {'CX Tau':0, 'Fomalhaut':0,'HL Tauri':0,'M 8':0,'HD 12345':0}

search_radius = '30'
for t,n in targets.items():
    targets[t] = Observations.query_criteria_count(
            objectname=t, 
            radius='{}s'.format(search_radius), 
            obs_collection='JWST'
            )

targets

It is clear that none of the targets in the list has an excessive number of matching observations. Now check the results for the targets with non-zero matching observations in detail. <b>Note:</b> since the loop creates one astropy table for each search, we place each in a list and then concatenate them for display. 

In [None]:
obs_list = []
for t,n in targets.items():
    if n > 0:
        obs = Observations.query_criteria(
            objectname=t, 
            radius='{}s'.format(search_radius), 
            obs_collection='JWST'
            )
        obs_list.append(obs)
        
target_matches = table.vstack(obs_list)
display_results(target_matches)

If you write the results table to a disk file in ECSV format (see [astropy table I/O](https://docs.astropy.org/en/stable/io/unified.html#table-io-ascii)), that will preserve the table metadata, as well as the option for reading the file as an astropy table in a subsequent python session.

In [None]:
target_matches.write('target_matches.ecsv', format='ascii.ecsv')

<a id="upload-targets"></a>
### Loading Targets from a File
Somtimes it may be more convenient to read in a list of targets from a local file rather than manually specifying each one.  We provide a convenience function that will read in a list of target names or coordinates from an input CSV file and output a list.  First we define the function `load_targets_from_file`.

In [None]:
def load_targets_from_file(filename, load_by='name', namecol='target_name', racol='ra', deccol='dec'):
    ''' load a list of targets from a csv file
    
    Loads a csv file returns a list of targets.  Can either extract and return a list of 
    target names or list of target coordinates.  When loading by target name, if nothing specified, 
    assumes the first column contains the target name. When loading by coordinate, if nothing specified, 
    assumes the first two columns contain the RA and Dec.
    
    Parameters:
        filename (str):
            The input filename to load
        load_by (str):
            Load by target name or coordinate.  Either "name" or "coord".  Default is load by target name.
        namecol (str):
            The name of the column in the file containing target name.  Default is "target_name".
        racol (str):
            The name of the column in the file containing Right Ascension.  Default is "ra".
        decol (str):
            The name of the column in the file containing Declination.  Default is "dec".
    
    Returns:
        A list of target names or target coords loaded from a file

    '''

    assert isinstance(filename, str), 'filename must be a string'
    assert load_by in ['name', 'coord'], 'load_by can only be either "name" or "coord"'

    # read in csv file into Pandas dataframe and format column names to lowercase, stripped whitespace
    assert filename.endswith('.csv'), 'filename must be a valid csv file'
    df = pd.read_csv(filename)
    df.rename(columns=lambda x: x.strip().lower(), inplace=True)

    # extract target names
    if load_by == 'name':
        namecol = namecol.strip().lower()
        if namecol:
            assert namecol in df.columns, 'csv file does not contain column {0}'.format(namecol)
        targets = df[namecol] if namecol else df[df.columns[0]]
        return targets.to_list()
    
    # extract target coordinates
    if load_by == 'coord':
        racol = racol.strip().lower()
        deccol = deccol.strip().lower()
        if racol:
            assert racol in df.columns, 'csv file does not contain column {0}'.format(racol)
        if deccol:
            assert deccol in df.columns, 'csv file does not contain column {0}'.format(deccol)

        coord = df[[racol, deccol]] if racol and deccol else df[df.columns[0:2]]
        targets = coord.to_records(index=False).tolist()
        return targets

Let's imagine we have a csv file, `targets.csv` on disk with the following contents:
```
target_name, RA, DEC
CX Tau, 67.910, 18.2336
Fomalhaut, 67.910, 18.2336
HL Tauri, 67.910, 18.2336
M 8, 67.910, 18.2336
HD 12345, 67.910, 18.2336
```
We can use our new function to read in the targets.  By default, ``load_targets_from_file`` loads objects by target name.  It can be toggled to instead load object coordinates by setting the ``load_by`` keyword argument.  When loading objects as target names, ``load_targets_from_file``, will look for a column called ``target_name``.  When loading objects as coordinates, it will look for columns called ``ra`` and ``dec``.  If your file has different names for these columns, you can override these column names using the ``namecol``, ``racol``, and ``deccol`` keyword arguments respectively.

In [None]:
# load objects as target names
target_names = load_targets_from_file('targets.csv')
print('Targets', target_names)

In [None]:
# load objects as target coordinates
target_coords = load_targets_from_file('targets.csv', load_by='coord')
print('Targets', target_coords)

Once your targets are loaded, you may query on them using the same ``astroquery`` functions as above, depending on whether you want to search by name or by coordinate.  For example, to query by name using your list of target names, we first need to get the count of the number of results before we can perform the real query.

In [None]:
# convert names to dictionary to hold result counts
targets = {name:0 for name in target_names}

# get the counts of each target
search_radius = '30'
for t,n in targets.items():
    targets[t] = Observations.query_criteria_count(
            objectname=t, 
            radius='{}s'.format(search_radius), 
            obs_collection='JWST'
            )

targets

<a id="Resources"></a>
# Additional Resources

* [astropy](https://docs.astropy.org/en/stable/index.html) documentation
* [astroquery](https://astroquery.readthedocs.io/en/latest/mast/mast.html) documentation for querying MAST
* [Queryable fields](https://mast.stsci.edu/api/v0/_c_a_o_mfields.html) in the MAST/CAOM database
* The [MAST Portal](https://mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html) web interface
* [pandas](https://pandas.pydata.org/) documentation

## About this notebook
This notebook was developed by Archive Sciences Branch staff, chiefly Susan Mullally and Dick Shaw. For support, please contact the Archive HelpDesk, at archive@stsci.edu. 
<img style="float: right;" src="https://raw.githubusercontent.com/spacetelescope/notebooks/master/assets/stsci_pri_combo_mark_horizonal_white_bkgd.png" alt="Space Telescope Logo" width="200px"/>