# Finding and Downloading Data For an Object Using Python
<hr style="border: 2px solid #fadbac" />

- **Description:** Tutorial on how to access HEASARC data using the Virtual Observatory client `pyvo`.
- **Level:** Intermediate
- **Data:** Find and download NuSTAR observations of the AGN **3C 105**
- **Requirements:** `pyvo`.
- **Credit:** Abdu Zoghbi (May 2022).
- **Support:** Contact the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback).
- **Last verified to run:** 02/28/2024

<hr style="border: 2px solid #fadbac" />

## 1. Introduction
This notebook presents a tutorial of how to access HEASARC data using the virtual observatory (VO) python client `pyvo`.

We handle the case of a user searching for data on a specific astronomical object from a *specific* high energy table. For a more general data access tutorial, see the <span style="color:red">add reference here when structure is restored</html>.

We will find all NuSTAR observations of **3C 105** that have an exposure of less than 10 ks.


This notebook searches the NuSTAR master catalog `numaster` using pyvo. We specifically use the `conesearch` service, which the VO service that allows for searching around a position in the sky (3C 105  in this case).

<div style='color: #333; background: #ffffdf; padding:20px; border: 4px solid #fadbac'>
<b>Running On Sciserver:</b><br>
The notebook requires <code>pyvo</code>, and on Sciserver, it is available on the <code>heasoft</code> conda kernel. Make sure you run the notbeook using that kernel by selecting it in the top right.
</div>

## 2. Module Imports
We need the following python modules:

In [1]:
# pip install pyvo astropy

In [2]:
import os

# pyvo for accessing VO services
import pyvo

# Use SkyCoord to obtain the coordinates of the source
from astropy.coordinates import SkyCoord

## 3. Finding and Downloading the data
This part assumes we know the ID of the VO service. Generally these are of the form: `ivo://nasa.heasarc/{table_name}`.

If you don't know the name of the table, you can search the VO registry, as illustrated in the <span style="color:red">add reference here when structure is restored</html>

### 3.1 The Search Service
First, we create a cone search service:

In [3]:
# Create a cone-search service
nu_services = pyvo.regsearch(ivoid="ivo://nasa.heasarc/numaster")[0]
cs_service = nu_services.get_service("conesearch")

### 3.2 Find the Data

Next, we will use the search function in `cs_service` to search for observations around our source, NGC 4151.

The `search` function takes as input, the sky position either as a list of `[RA, DEC]`, or as a an astropy sky coordinate object `SkyCoord`.

The search result is then printed as an astropy Table for a clean display.

In [4]:
# Find the coordinates of the source
pos = SkyCoord.from_name("3c 105")

search_result = cs_service.search(pos)

# display the result as an astropy table
search_result.to_table()

__row,name,ra,dec,time,obsid,status,exposure_a,observation_mode,obs_type,processing_date,public_date,issue_flag,Search_Offset
Unnamed: 0_level_1,Unnamed: 1_level_1,deg,deg,d,Unnamed: 5_level_1,Unnamed: 6_level_1,s,Unnamed: 8_level_1,Unnamed: 9_level_1,d,d,Unnamed: 12_level_1,Unnamed: 13_level_1
object,object,float64,float64,float64,object,object,float64,object,object,float64,int32,int16,float64
3554,3C105,61.8022,3.6837,56338.0876,60061044002,archived,4807,SCIENCE,EGS,59168.6,57112,0,1.7172
3555,3C105,61.8059,3.6858,56339.164,60061044006,archived,5583,SCIENCE,EGS,59168.5,57112,0,1.4912
3556,3C105,61.8034,3.6859,56338.6258,60061044004,archived,6208,SCIENCE,EGS,59168.6,57112,0,1.568
3557,3C105,61.802,3.6872,57826.4661,60261003004,archived,20703,SCIENCE,ELS,59110.3,57836,0,1.5573
3559,3C105,61.841,3.7349,57621.2994,60261003002,archived,20737,SCIENCE,ELS,59113.7,57625,0,2.1365


### 3.3 Filter the Results

The search returned several entries.

Let's say we are interested only in observations with exposures smaller than 10 ks. We do that with a loop over the search results.

In [5]:
obs_to_explore = [res for res in search_result if res["exposure_a"] <= 10000]
obs_to_explore

[('3554', '3C105', '61.8022', '3.6837', '56338.0876', '60061044002', 'archived', '4807.3775', 'SCIENCE', 'EGS', '59168.6', '57112', '0', '1.7172299582982673'),
 ('3555', '3C105', '61.8059', '3.6858', '56339.164', '60061044006', 'archived', '5582.6419', 'SCIENCE', 'EGS', '59168.5', '57112', '0', '1.4911511920888738'),
 ('3556', '3C105', '61.8034', '3.6859', '56338.6258', '60061044004', 'archived', '6208.4217', 'SCIENCE', 'EGS', '59168.6', '57112', '0', '1.5679511763430094')]

### 3.4 Find Links for the Data

The exposure selection resulted in 3 observations (this may change as more observations are collected). Let's try to download them for analysis.

To see what data products are available for these 3 observations, we use the VO's datalinks. A datalink is a way to query data products related to some search result.

The results of a datalink call will depend on the specific observation. To see the type of products that are available for our observations, we start by looking at one of them.

In [6]:
obs = obs_to_explore[0]
dlink = obs.getdatalink()

# only 3 summary columns are printed
dlink.to_table()[["ID", "access_url", "content_type"]]

ID,access_url,content_type
object,object,object
ivo://nasa.heasarc/numaster?60061044002,https://heasarc.gsfc.nasa.gov/xamin/bib?table=numaster&id=60061044002,text/html
ivo://nasa.heasarc/numaster?60061044002,https://heasarc.gsfc.nasa.gov/xamin/vo/datalink?datalink_key&id=ivo://nasa.heasarc/numaster?60061044002/nustar.obs,application/x-votable+xml;content=datalink
ivo://nasa.heasarc/numaster?60061044002,https://heasarc.gsfc.nasa.gov/FTP/nustar/data/obs/00/6//60061044002/,directory


### 3.4 Filter the Links

Three products are available for our selected observation. From the `content_type` column, we see that one is a `directory` containing the observation files. The `access_url` column gives the direct url to the data (The other two include another datalink service for house keeping data, and a document to list publications related to the selected observation).

We can now loop through our selected observations in `obs_to_explore`, and extract the url addresses with `content_type` equal to `directory`.

Note that an empty datalink product indicates that no public data is available for that observation, likely because it is in proprietary mode.

In [7]:
# loop through the observations
links = []
for obs in obs_to_explore:
    dlink = obs.getdatalink()
    dlink_to_dir = [dl for dl in dlink if dl["content_type"] == "directory"]

    # if we have no directory product, the data is likely not public yet
    if len(dlink_to_dir) == 0:
        continue

    link = dlink_to_dir[0]["access_url"]
    print(link)
    links.append(link)

https://heasarc.gsfc.nasa.gov/FTP/nustar/data/obs/00/6//60061044002/
https://heasarc.gsfc.nasa.gov/FTP/nustar/data/obs/00/6//60061044006/
https://heasarc.gsfc.nasa.gov/FTP/nustar/data/obs/00/6//60061044004/


### 3.5 Download the Data

On Sciserver, all the data is available locally under `/FTP/`, so all we need is to use the link text after `FTP` and copy them to the current directory.


If this is run outside Sciserver, we can download the data directories using `wget` (or `curl`)

Set the `on_sciserver` to `False` if using this notebook outside Sciserver

In [8]:
on_sciserver = os.environ["HOME"].split("/")[-1] == "idies"

if on_sciserver:
    # copy data locally on sciserver
    for link in links:
        os.system(f"cp -r /FTP/{link.split('FTP')[1]} .")

else:
    # use wget to download the data
    wget_cmd = (
        "wget -q -nH --no-check-certificate --no-parent --cut-dirs=6 "
        "-r -l0 -c -N -np -R 'index*' -erobots=off --retr-symlinks {}"
    )

    for link in links:
        os.system(wget_cmd.format(link))