# SPARCLCLIENT Example Usage

In [None]:
__author__ = 'Steve Pothier <steve.pothier@noirlab.edu>'
__version__ = '20240224' # yyyymmdd; 
__keywords__ = ['HowTo', 'astronomy', 'tutorial', 'client', 'sparcl', 'NOIRlab']

## Table of contents
* [Goals & Summary](#goals)
* [Imports and setup](#imports)
* [Install SPARCLCLIENT](#install)
* [Prepare to use sparcl](#prepare)
* [Get general info from SPARCL](#info)
* [Get Metadata and Spectra](#get)

<a class="anchor" id="goals"></a>
## Goals & Summary 
Demonstrate the use of the `sparclclient` package to get metadata and spectra data from the [NOIRLab SPARCL Server](https://astrosparcl.datalab.noirlab.edu/). Show how to get non-public data if you have authorized credentials.
- Discovery: Search for matching metadata and return metadata records.
- Retrieve spectra

<a class="anchor" id="imports"></a>
## Imports and Setup

In [None]:
from pprint import pformat as pf
from pprint import pp
import os.path
from importlib import reload
from collections import defaultdict
from datetime import datetime
import warnings
from getpass import getpass

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

class StopExecution(Exception):
    def _render_traceback_(self):
        pass

# %matplotlib inline
# requires installing ipympl
%matplotlib widget
plt.rcParams['font.size'] = 14

<a class="anchor" id="install"></a>
## Install most recent version of the SPARCLCLIENT
*NOTE: After installing the most recent version, please restart your kernel.*

In [None]:
# !pip install --upgrade sparclclient         # Latest released version
# !pip install --pre --upgrade sparclclient   # Lastest pre-released version

# Uncomment next line to load SPARCLCLIENT from local current version of software.
!pip install --pre --upgrade ../..

In [None]:
import sparcl.client
print(f'Run started: {str(datetime.now())}')

<a class="anchor" id="prepare"></a>
# Configure SPARCLCLIENT

In [None]:
# How much output to we want to show?
show_help = False   # HELP for client functions
show_curl = False   # Show the underlying SPARCL Server API call

server = 'https://astrosparcl.datalab.noirlab.edu'  # Public Server
server = 'https://sparc1.datalab.noirlab.edu'       # internal TEST Server
server = 'http://localhost:8050'                    # internal DEV Server

priv_dr = 'SDSS-DR17'

# Authenticated Users that are never authorized for anything important.
# These are authenticated on both Public and Test SSO servers.
auth_user   = 'test_user_1@noirlab.edu'
unauth_user = 'test_user_2@noirlab.edu'
non_user    = 'test_user_3@noirlab.edu'
usrpw = getpass()

In [None]:
if show_help:
    help(sparcl.client.SparclClient)
client = sparcl.client.SparclClient(url=server, show_curl=show_curl)
print(f'{client=}')

<a class="anchor" id="info"></a>
# General Info from SPARCL

<a class="anchor" id="datasets"></a>
## Data sets available
List all currently available data sets from the server/url associated with client

In [None]:
client.all_datasets

<a class="anchor" id="defaultfieldnames"></a>
## Default field names
Gets fields tagged as 'default' that are common to all data sets in the `dataset_list` passed to the function. If `dataset_list` is None (the default), the function returns the intersection of 'default' fields across all datasets currently available in the SPARC database. The following example of this function produces the same output as it would with no `dataset_list` argument because we currently only have SDSS-DR16 and BOSS-DR16 records in the SPARC database.

In [None]:
if show_help:
    client.get_default_fields?

In [None]:
client.get_default_fields(dataset_list=['SDSS-DR16', 'BOSS-DR16'])

<a class="anchor" id="allfieldnames"></a>
## All field names
Gets fields tagged as 'all' that are common to all data sets in the `dataset_list` passed to the function. If `dataset_list` is None (the default), the function returns the intersection of 'all' fields across all datasets currently available in the SPARC database.  The following example of this function produces the same output as it would with no `dataset_list` argument because we currently only have SDSS-DR16 and BOSS-DR16 records in the SPARC database.

In [None]:
client.get_all_fields?

In [None]:
print(sorted(client.get_all_fields(dataset_list=['SDSS-DR16', 'BOSS-DR16'])))

## Version of Server API used by this client
The SPARCL Client you use must match the version of the SPARCL Server you use. The server is specified with the client.SparclClient `url` parameter.  If Server and Client are incompatible, when you excecute SparclClient() you will instructed to upgrade your client.

In [None]:
client.version

<a class="anchor" id="get"></a>
# Get Metadata and Spectra

<a class="anchor" id="find"></a>
## Get Metadata: `client.find`

The first way you can discover your data is by using SPARCL's `client.find()` method, which allows you to find records in the SPARCL database based on certain parameters passed to the function. Only Core fields may be in the `outfields` and `constraints` parameters. The descriptions for all fields, including Core fields, is located [here](https://astrosparcl.datalab.noirlab.edu/sparc/sfc/). The SPARCL Core fields constraint types are:


| Field name       | Constraint type | Example |
|:----------------|:---------------|:-------|
| id               | List of values (but not<br>intended for data discovery) | ['00001658-460c-4da1-987d-e493d8c9b89b',<br>'000017b6-56a2-4f87-8828-3a3409ba1083']
| specid           | List of values | [6988698046080241664, 6971782884823945216]
| targetid         | List of values | [1237679502171374316, 1237678619584692841]
| data_release     | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) | ['BOSS-DR16', 'SDSS-DR16']
| datasetgroup     | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) | ['SDSS_BOSS']
| ra               | Range of values (may not<br>"wrap" around RA=0) | [44.53, 47.96]
| dec              | Range of values | [2.03, 7.76]
| redshift         | Range of values | [0.5, 0.9]
| redshift_err     | Range of values | [0.000225, 0.000516]
| redshift_warning | List of values  | [0, 3, 5]
| spectype         | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) | ['GALAXY', 'STAR']
| instrument       | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) | ['SDSS', 'BOSS']
| telescope        | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) | ['sloan25m']
| site             | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) |  ['apo']
| specprimary      | List of values (but typically<br>would only include 1 if<br>being used for data<br>discovery constraints) | [1]
| wavemin          | Range of values | [3607, 3608]
| wavemax          | Range of values | [10363, 10364]
| dateobs_center   | Range of values | ['2013-03-14T10:16:17Z',<br>'2014-05-24T12:10:00Z']
| exptime          | Range of values | [3603.46, 3810.12]
| updated          | Range of values | ['2022-08-20T21:37:50.636363Z',<br>'2022-09-20T20:00:00.000000Z']


In [None]:
if show_help:
    client.find?

#### Define fields and constraints for metadata FIND
Define the fields we want returned (`outfields`) and the constraints (`constraints`)

In [None]:
out = ['sparcl_id','specid', 'ra', 'dec', 'redshift', 'spectype', 'data_release', 'redshift_err']
cons = {'spectype': ['GALAXY'],
        'redshift': [0.5, 0.9],
        'data_release': ['BOSS-DR16', 'SDSS-DR16']}

#### Execute FIND
Execute the `client.find()` method with our parameters.
The `limit` argument here is being used for demonstration purposes only, and simply returns only the first 20 results here.

In [None]:
found = client.find(outfields=out, constraints=cons, limit=20)

In [None]:
pd.DataFrame.from_records(found.records)

<a class="anchor" id="retrieve"></a>
## Get Spectra: `client.retrieve`

In order to retrieve spectra records from SPARCL, pass the following to the `client.retrieve()` function:
```
uuid_list : List of IDs.
dataset_list : List of data sets to search for the IDs in (default: None).
include : List of field names to include in each record (default: 'DEFAULT').
```

**NOTE: A reasonable amount of records to request retrieval of is about 10,000. Exceeding this value may cause the retrieval to timeout or fail.**

In [None]:
if show_help:
    client.retrieve?

#### Use IDs from FIND to RETRIEVE records
Use the IDs from the output of using `client.find()` to retrieve records from SPARCL. 

Note that `ids` in `found_I.ids` is a property name of the Found class. It is a list of records from all records, not a field name of a record.

In [None]:
# Define the fields to include in the retrieve function
inc = ['specid', 'data_release', 'redshift', 'flux', 'wavelength', 'model', 'ivar', 'mask', 'spectype']

In [None]:
%%time
results = client.retrieve(uuid_list=found.ids,
                          include=inc,
                          dataset_list=['SDSS-DR16','BOSS-DR16'])
results.info

In [None]:
results.records[0]

## Plot spectra

In [None]:
recs = results.records
idx = 0
fig=plt.figure(1, figsize=(8,4), dpi= 100, facecolor='w', edgecolor='k')
fline, = plt.plot(recs[idx].flux, label=f'flux')
mline, = plt.plot(recs[idx].model, label=f'model')
plt.legend(handles=[fline,mline])

## Plot FLUX for all records

In [None]:
print('Ignoring unsupported feature: align_records')

In [None]:
#import sparcl.gather_2d
#ar_dict, grid = sparcl.gather_2d.align_records(results.records)
#modeldf = pd.DataFrame(data=ar_dict['flux'],columns=grid)
#modeldf.transpose().plot(xlabel='Wavelength', ylabel='Flux', legend=False)

# Authorization
Your access to data is affected by how you login (or don't).  Both `client.find` and `client.retrieve` allow you to request data (possibly implictly) from specific Datasets. Its possible for your combination of LOGIN and FIND (or RETIEVE) to work now, but fail later without you changing anything. For instance, if you don't login and ask for data from ALL Datasets at a time when all Datasets are public, your FIND will succeed. But if NOIRLab adds a new Dataset that is private, your same find will fail. To avoid the failure, you would have to explicitly request only the public Datasets, or to login as a user that is authorized to access the private Dataset.

So summarize, there are three cases in which your FIND or RETRIEVE will be authorized:
1. All Datasets are Public (does not matter what you login status is)
2. You have explicitly requested only Public Datasets (does not matter what you login status is)
3. You are logged in and are authorized to access all the Private Datasets you have (explicitly or implicitly) requested.

You might be authorized to access one Dataset, but not another.  So, you must be careful in case #3 above to explictly request the correct Private Dataset(s).

## Logging in and logging out

In [None]:
if show_help:
    client.login?
    client.logout?

In [None]:
client.login(auth_user, usrpw)

In [None]:
client.authorized

In [None]:
client.logout()   # can also be done with client.login(None)

In [None]:
client.authorized

## FIND

### Pass FIND with Public DRs as Anonymous

In [None]:
client.logout()

In [None]:
out = ['sparcl_id','specid', 'ra', 'dec', 'redshift', 'spectype', 'data_release', 'redshift_err']
cons = {'spectype': ['GALAXY'],
        'redshift': [0.5, 0.9],
        'data_release': ['BOSS-DR16', 'SDSS-DR16']}
found = client.find(outfields=out, constraints=cons, limit=2)
pp(found.info)
print(found.records[0])
print(f'\nSUCCESS: {found.count=} records from FIND')

### Fail FIND with prviate DR as Anonymous

In [None]:
client.authorized

In [None]:
out = ['sparcl_id','specid', 'ra', 'dec', 'redshift', 'spectype', 'data_release', 'redshift_err']
cons = {'spectype': ['GALAXY'],
        'redshift': [0.5, 0.9],
        'data_release': ['BOSS-DR16',priv_dr]}
try:
    found = client.find(outfields=out, constraints=cons, limit=2)
    print('FOUND info:')
    pp(found.info)
    print(f'\nFOUND records. {found.records[0]=}')
    gotrecord = True
except Exception as err:
    gotrecord = False
    print(f'SUCCESS: Could not execute find: {err}')

if gotrecord:
    raise Exception('Wrongly got record from PRIVATE DR {priv_dr}')

### Fail FIND with prviate DR as Unauthorized

In [None]:
client.login(unauth_user, usrpw)

In [None]:
try:
    found = client.find(outfields=out, constraints=cons, limit=2)
    print('FOUND info:')
    pp(found.info)
    print(f'\nFOUND records. {found.records[0]=}')
    gotrecord = True
except Exception as err:
    gotrecord = False
    print(f'SUCCESS: Could not execute find: {err}')

if gotrecord:
    raise Exception('Wrongly got record from PRIVATE DR {priv_dr}')

### Pass FIND with prviate DR as Authorized

In [None]:
client.login(auth_user, usrpw)

In [None]:
client.authorized

In [None]:
found = client.find(outfields=out, constraints=cons, limit=2)
print('FOUND info:')
pp(found.info)

### Fail FIND with Unknown user
User is authenticated with SSO, but is unknown to SPARCL

In [None]:
client.login(non_user, usrpw)

In [None]:
try:
    found = client.find(outfields=out, constraints=cons, limit=2)
    print('FOUND info:')
    pp(found.info)
    print(f'\nFOUND records. {found.records[0]=}')
    gotrecord = True
except Exception as err:
    gotrecord = False
    print(f'SUCCESS: Could not execute find: {err}')

if gotrecord:
    raise Exception('Wrongly got record from PRIVATE DR {priv_dr}')

## RETRIEVE

### Pass RETRIEVE with public DRs as Anonymous

In [None]:
client.authorized

In [None]:
inc = ['specid', 'data_release', 'redshift', 'flux', 'spectype']
got = client.retrieve(uuid_list=found.ids,
                          include=inc,
                          dataset_list=['SDSS-DR16','BOSS-DR16'])
print(f'{got.records[0].spectype=} {len(got.records[0].flux)=}')

### Fail RETRIEVE with private DR as Anonymous

In [None]:
try:
    got = client.retrieve(uuid_list=found.ids,
                          include=inc,
                          dataset_list=['SDSS-DR16',priv_dr,'BOSS-DR16'])
    gotrecord = True
except Exception as err:
    gotrecord = False
    print(f'Correctly could not retrieve: {err}')

if gotrecord:
    raise Exception('Wrongly got record from PRIVATE DR {priv_dr}')

### Pass RETRIEVE with private DRs as Authorized

In [None]:
client.login(auth_user, usrpw)

In [None]:
client.authorized

In [None]:
got = client.retrieve(uuid_list=found.ids,
                          include=inc,
                          dataset_list=['SDSS-DR16',priv_dr,'BOSS-DR16'])
print(f'{got.count=}')

# All Done

In [None]:
print(f'Run finished: {str(datetime.now())}')