In [None]:
## setup the notebook

%load_ext autoreload
%autoreload 2

import sys
import os
import astropy.coordinates as coord
import pyvo
sys.path.insert(0, os.getcwd())
import fornax
print(f'\nUsing fornax library in: {fornax.__file__}\n')
pos = coord.SkyCoord.from_name("ngc 4151")


---
# A Simple User Case:
Simple case of
> user does a query to on prem service, gets addresses for where a file lives (both on prem and S3), user manually specifies to get file from cloud, and downloads/reads it.  (First step before doing anything smart inside the code with geolocation or bucket policies.)  

In [None]:
# Query data provider 
query_url = 'https://mast.stsci.edu/portal_vo/Mashup/VoQuery.asmx/SiaV1?MISSION=HST&'
query_result = pyvo.dal.sia.search(query_url, pos=pos, size=0.0)
table_result = query_result.to_table()
col_name = query_result.fieldname_with_ucd('VOX:Image_AccessReference')
data_product = table_result[0]


# get on-prem data
prem_handle = fornax.get_data_product(data_product, access_url_column=col_name)
prem_handle.download()


# Get aws data
aws_handle = fornax.get_data_product(data_product, 'aws', access_url_column=col_name)
aws_handle.download()

---

# Cloud access use cases:
Here we document the use cases that this library handles, raised in issue #1, and usage examples.


## 1. Public Data
Public data is, by definition, accessible from anywhere. The user needs not be on AWS. 

The information that the data is public is provided as part of the data product metadata.

- **Example:** HST data on the space selescope open data bucket `stpubdata`.
- **Authentication:** Not needed.
- **Run on:** Anywhere (AWS or local).

In [None]:
query_url = 'https://mast.stsci.edu/portal_vo/Mashup/VoQuery.asmx/SiaV1?MISSION=HST&'
query_result = pyvo.dal.sia.search(query_url, pos=pos, size=0.0)
table_result = query_result.to_table()
col_name = query_result.fieldname_with_ucd('VOX:Image_AccessReference')
data_product = table_result[0]

fornax.get_data_product(data_product, 'aws', access_url_column=col_name)

This is another example using data configured from daskhub:

- **Example:** Accessing chandra data from `dh-fornaxdev-public`.
- **Authentication:** Not needed.
- **Run on:** Anywhere (AWS or local).

Note that here, we are injecting the name of the bucket in the code rather than changing the server response. This is a quicker way to do it.

In [None]:
query_url = 'https://heasarc.gsfc.nasa.gov/xamin_aws/vo/sia?table=chanmaster&'
query_result = pyvo.dal.sia.search(query_url, pos=pos, size=0.0)
table_result = query_result.to_table()
col_name = query_result.fieldname_with_ucd('VOX:Image_AccessReference')
data_product = table_result[0]


data_product['cloud_access'] = data_product['cloud_access'].replace(
    'dh-fornaxdev', 'dh-fornaxdev-public').replace(
    '"access": "region"', '"access": "open"')
fornax.get_data_product(data_product, 'aws', access_url_column=col_name)

<br />

## 2. Restricted Data
This is the case where data access is allowed only when the user is authenticated and has access rights to the data. The data `access` mode should be `'restricted'`.

<p style='color:red; font-size:22px; style:bold; background:yellow'> Note:</p>
we don't yet have a strictly region-restricted bucket and therefore `access` mode `'region'`, where access to the bucket would region-restricted, cannot really be tested at the moment.


### 2.1 No Credentials Provided
- **Example:** Data in `dh-fornaxdev`
- **Authentication:** Not provided.
- **Run on:** If run outside daskhub, it should fail. If run in daskhub, we have credentials in the environment, so we fall back to **section 2.2.1**.

In [None]:
query_url = 'https://heasarc.gsfc.nasa.gov/xamin_aws/vo/sia?table=chanmaster&'
query_result = pyvo.dal.sia.search(query_url, pos=pos, size=0.0)
table_result = query_result.to_table()
col_name = query_result.fieldname_with_ucd('VOX:Image_AccessReference')
data_product = table_result[0]
fornax.get_data_product(data_product, 'aws', access_url_column=col_name)

The message indicates that:
- we tried accessing the data anonoymously, and we got a Forbidden (403)
- next, we try searching for credentials in the environment variables, and that fails (unless we are inside daskhub, in which case, we use the environment credentials).

---
### 2.2 Credentials Provided

#### 2.2.1 Credentials provided by the environment
- **Example:** Data in `dh-fornaxdev`
- **Authentication:** By the enivronment (`$AWS_ROLE_ARN`).
- **Run on:** daskhub (No access outside daskhub).

In [None]:
query_url = 'https://heasarc.gsfc.nasa.gov/xamin_aws/vo/sia?table=chanmaster&'
query_result = pyvo.dal.sia.search(query_url, pos=pos, size=0.0)
table_result = query_result.to_table()
col_name = query_result.fieldname_with_ucd('VOX:Image_AccessReference')
data_product = table_result[0]
fornax.get_data_product(data_product, 'aws', access_url_column=col_name)

Another example using Spizter data in bucket `irsa-mast-tike-spitzer-data`. Here we use the IRSA SIA service and then add the `cloud_access` column by hand.

- **Example:** Data in `irsa-mast-tike-spitzer-data`
- **Authentication:** By the enivronment (`$AWS_ROLE_ARN`).
- **Run on:** daskhub (No access outside daskhub).

In [None]:
query_url = ('https://irsa.ipac.caltech.edu/cgi-bin/Atlas/nph-atlas?mission=SEIP&hdr_location='
             '%5CSEIPDataPath%5C&SIAP_ACTIVE=1&collection_desc=SEIP&')
query_result = pyvo.dal.sia.search(query_url, pos=coord.SkyCoord(151.1, 2.0, unit="deg"), size=0.0)
table_result = query_result.to_table()
col_name = query_result.fieldname_with_ucd('VOX:Image_AccessReference')

# inject the cloud_access column #
urls = table_result['sia_url'].tolist()
json_template = '{"aws": { "bucket_name": "irsa-mast-tike-spitzer-data", "region": "us-east-1", "access": "restricted", "key": "%s" }}'
json_col = [json_template%('/'.join(u.split('/')[3:]) ) for u in urls]
table_result.add_column(json_col, name='cloud_access')
# ------------------------------ #

data_product = table_result[30]
fornax.get_data_product(data_product, 'aws', access_url_column=col_name)

#### 2.2.1 Credentials provided by the environment, example 2
- **Example:** Data in `heasarc-1` (configured on NGAP). 
- **Authentication:** By the enivronment (`AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN`).
- **Run on:** Anywhere (No access if no credentials are provided).

Note again, that the bucket name is injected in. the code rather than by modifying the server.

If you have access to NGAP, then you can generate access credentials on Kion.

In [None]:
query_url = 'https://heasarc.gsfc.nasa.gov/xamin_aws/vo/sia?table=chanmaster&'
query_result = pyvo.dal.sia.search(query_url, pos=pos, size=0.0)
table_result = query_result.to_table()
col_name = query_result.fieldname_with_ucd('VOX:Image_AccessReference')
data_product = table_result[0]

data_product['cloud_access'] = data_product['cloud_access'].replace(
    'dh-fornaxdev', 'heasarc-1')

fornax.get_data_product(data_product, 'aws', access_url_column=col_name)

#### 2.2.1 Credentials provided by passing a `profile`.
- **Example:** Data in `heasarc-1` (configured on NGAP). 
- **Authentication:** We use the profile name `ngap_user` in `~/.aws/credentials`. The file may look something like:
```
[ngap_user]
aws_access_key_id=SOME_CODE
aws_secret_access_key=SOME_CODE
aws_session_token=SOME_CODE
```
- **Run on:** In principle anywhere, but may depend on the bucket policy. For `heasarc-1`, the bucket is configured to be accessible from anywhere if the user has the credentials.

If you have access to NGAP, then you can generate access credentials on Kion.

In [None]:
query_url = 'https://heasarc.gsfc.nasa.gov/xamin_aws/vo/sia?table=chanmaster&'
query_result = pyvo.dal.sia.search(query_url, pos=pos, size=0.0)
table_result = query_result.to_table()
col_name = query_result.fieldname_with_ucd('VOX:Image_AccessReference')
data_product = table_result[0]

data_product['cloud_access'] = data_product['cloud_access'].replace(
    'dh-fornaxdev', 'heasarc-1')

fornax.get_data_product(data_product, 'aws', access_url_column=col_name, profile='ngap_user')