In [None]:
__author__ = 'Alice Jacques <alice.jacques@noirlab.edu>, SPARCL team <datalab-spectro@noirlab.edu>'
__version__ = '20250128' # yyyymmdd
__datasets__ = ['desi_dr1']  
__keywords__ = ['sparcl', 'spectroscopy', 'HowTo', 'desi spectra', 'tutorial']

# How to use SPARCL to access DESI DR1 data
## SPectra Analysis and Retrievable Catalog Lab (SPARCL)
Alice Jacques (NOIRLab), Stéphanie Juneau (NOIRLab), Benjamin Weaver (NOIRLab), Steve Pothier (NOIRLab), Adam Bolton (SLAC) and the SPARCL team

### Table of contents
* [Goals & Summary](#goalssummary)
* [Disclaimer & attribution](#disclaimer)
* [If necessary, install the most recent version of the SPARCL Client](#install)
* [Imports and setup](#imports)
* [Authentication for SPARCL](#auth_sparcl)
* [Data discovery: using SPARCL's <tt>client.find()</tt> method](#datadiscovery)
* [Retrieve records by <tt>sparcl_id</tt> using <tt>client.retrieve()</tt>](#retrieve)
* [Retrieve records by <tt>specid</tt> using <tt>client.retrieve_by_specid()</tt>](#retrieve_specid)
* [Access fields in records](#access)
* [Convert retrieved output to Pandas DataFrame or Spectrum1D object](#convert)
* [Plot spectra](#plot)

<a class="anchor" id="goalssummary"></a>
## Goals & Summary 

SPARCL (SPectra Analysis and Retrievable Catalog Lab) is an online service for discovery and retrieval of one-dimensional optical-infrared spectra. SPARCL is designed to work for large survey datasets containing many millions of spectra, and to provide access to multiple different data sets through common methods. For more information, see the [SPARCL User Guide](https://astrosparcl.datalab.noirlab.edu/static/SPARCLUserManual.pdf).

This notebook provides a basic introduction to using the SPARCL client (or sparclclient) to find and retrieve DESI spectroscopic data within a Python notebook context. The sparclclient connects to the SPARCL server at [NSF NOIRLab](https://noirlab.edu/public/) and provides access to the contents of the SPARCL database.

To see the current data sets and number of spectra available in the SPARCL database, please visit the [SPARCL summary page](https://astrosparcl.datalab.noirlab.edu/sparc/).

To get the latest sparclclient documentation, visit the [sparclclient readthedocs site](https://sparclclient.readthedocs.io/en/latest/).

Feedback on SPARCL (questions, comments, science use cases, feature requests, bug reports, confusing error messages, etc.) can be submitted to datalab-spectro@noirlab.edu. For bug reports and confusing error messages, it's helpful if you include: a log of *what you did*, *the result you got*, and *the result you expected*.

See our science use-case notebooks that use SPARCL:
- [Introduction to DESI Early Data Release (EDR) at the Astro Data Lab](https://github.com/astro-datalab/notebooks-latest/blob/master/03_ScienceExamples/DESI/01_Intro_to_DESI_EDR.ipynb)
- [Comparing SDSS and DESI spectra using SPARCL](https://github.com/astro-datalab/notebooks-latest/blob/master/03_ScienceExamples/DESI/02_DESI_EDR_SDSS_Comparison.ipynb)
- [Stacking SDSS Spectra of Galaxies Selected from the BPT Diagram](https://github.com/astro-datalab/notebooks-latest/blob/master/03_ScienceExamples/EmLineGalaxies/01_EmLineGalaxies_SpectraStack.ipynb)
- [Multi-wavelength Image Cutouts and SDSS Spectra of Active Galaxies with Extreme Emission-Line Ratios](https://github.com/astro-datalab/notebooks-latest/blob/master/03_ScienceExamples/EmLineGalaxies/02_EmLineGalaxies_Outliers.ipynb)

And our other How-To notebooks that use SPARCL:
- [Obtain spectra with SPARCL and plot them with Jdaviz](https://github.com/astro-datalab/notebooks-latest/blob/master/04_HowTos/SPARCL/Plot_SPARCL_Spectra_with_Jdaviz.ipynb)
- [Obtain spectra with SPARCL and plot them with prospect](https://github.com/astro-datalab/notebooks-latest/blob/master/04_HowTos/SPARCL/Plot_SPARCL_Spectra_with_Prospect.ipynb)

<a class="anchor" id="attribution"></a>
# Disclaimer & attribution

Disclaimers
-----------
Note that using the Astro Data Lab constitutes your agreement with our minimal [Disclaimers](https://datalab.noirlab.edu/disclaimers.php).

Acknowledgments
---------------
If you use **Astro Data Lab** in your published research, please include the text in your paper's Acknowledgments section:

_This research uses services or data provided by the Astro Data Lab, which is part of the Community Science and Data Center (CSDC) Program of NSF NOIRLab. NOIRLab is operated by the Association of Universities for Research in Astronomy (AURA), Inc. under a cooperative agreement with the U.S. National Science Foundation._

If you use **SPARCL jointly with the Astro Data Lab platform** (via JupyterLab, command-line, or web interface) in your published research, please include this text below in your paper's Acknowledgments section:

_This research uses services or data provided by the SPectra Analysis and Retrievable Catalog Lab (SPARCL) and the Astro Data Lab, which are both part of the Community Science and Data Center (CSDC) Program of NSF NOIRLab. NOIRLab is operated by the Association of Universities for Research in Astronomy (AURA), Inc. under a cooperative agreement with the U.S. National Science Foundation._

In either case **please cite the following papers**:

* Data Lab concept paper: Fitzpatrick et al., "The NOAO Data Laboratory: a conceptual overview", SPIE, 9149, 2014, https://doi.org/10.1117/12.2057445

* Astro Data Lab overview: Nikutta et al., "Data Lab - A Community Science Platform", Astronomy and Computing, 33, 2020, https://doi.org/10.1016/j.ascom.2020.100411

If you are referring to the Data Lab JupyterLab / Jupyter Notebooks, cite:

* Juneau et al., "Jupyter-Enabled Astrophysical Analysis Using Data-Proximate Computing Platforms", CiSE, 23, 15, 2021, https://doi.org/10.1109/MCSE.2021.3057097

If publishing in a AAS journal, also add the keyword: `\facility{Astro Data Lab}`

And if you are using SPARCL, please also add `\software{SPARCL}` and cite:

* Juneau et al., "SPARCL: SPectra Analysis and Retrievable Catalog Lab", Conference Proceedings for ADASS XXXIII, 2024
https://doi.org/10.48550/arXiv.2401.05576

The NOIRLab Library maintains [lists of proper acknowledgments](https://noirlab.edu/science/about/scientific-acknowledgments) to use when publishing papers using the Lab's facilities, data, or services.

<a class="anchor" id="install"></a>
### If necessary, install the most recent version of the SPARCL Client:
If you are using the Astro Data Lab Jupyter notebook server, you do not need to run this cell.
### NOTE: After installing the most recent version, please restart your kernel.

In [None]:
## Uncomment the following only if SPARCL client is not already installed
#!pip install --upgrade sparclclient

<a class="anchor" id="imports"></a>
## Imports and setup

In [None]:
# SPARCL imports
from sparcl.client import SparclClient

# 3rd party imports
import numpy as np
import astropy.units as u
from specutils import Spectrum1D
from astropy.nddata import InverseVariance
from astropy.convolution import convolve, Gaussian1DKernel
import matplotlib.pyplot as plt
import pandas as pd

from getpass import getpass

# plots default setup
plt.rcParams['font.size'] = 14
plt.rcParams['figure.figsize'] = (14,8)

#### Create a SPARCL client instance:

In [None]:
client = SparclClient()
client

<a class="anchor" id="auth_sparcl"></a>
# Authentication for SPARCL
All public SPARCL data sets can be accessed without explicitly logging in. However, some data sets are private and can only be accessed by authorized users. If you are an authorized user and wish to login to SPARCL, un-comment the cell below and enter your NOIRLab CSDC SSO user name and password. If you need to create an account, sign-up at https://sso.csdc.noirlab.edu/account/signup/. If you encounter an issue, email datalab-spectro@noirlab.edu with your First Name, Last Name, and Email Address (the same one you used to create your NOIRLab CSDC SSO account).

To logout of SPARCL after a session, use:
```
client.logout()
```

For assistance with SPARCL authentication/authorization please contact us at datalab-spectro@noirlab.edu

In [None]:
client.login(input("Enter SSO user name: (+ENTER) "),getpass("Enter password: (+ENTER) "))

#### View which data sets you have access to:
**Note:** if you are not logged in or if your SSO user name is not in the authorized list of SPARCL users, you will only see public data sets

In [None]:
client.authorized

<a class="anchor" id="datadiscovery"></a>
## Data discovery: using SPARCL's `client.find()` method
The first way you can discover your data is by using SPARCL's `client.find()` method, which allows you to find records in the SPARCL database based on certain parameters passed to the function. Only Core fields may be in the `outfields` and `constraints` parameters. The descriptions for all fields, including Core fields, is located [here](https://astrosparcl.datalab.noirlab.edu/sparc/sfc/). The SPARCL Core fields constraint types are:


| Field name       | Constraint type | Example |
|:----------------|:---------------|:-------|
| data_release     | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) | ['DESI-EDR', 'BOSS-DR16', 'SDSS-DR16']
| datasetgroup     | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) | ['DESI', 'SDSS_BOSS']
| dateobs_center   | Range of values | ['2013-03-14T10:16:17Z',<br>'2014-05-24T12:10:00Z']
| dec              | Range of values | [2.03, 7.76]
| exptime          | Range of values | [3603.46, 3810.12]
| instrument       | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) | ['SDSS', 'BOSS', 'DESI']
| ra               | Range of values (may not<br>"wrap" around RA=0) | [44.53, 47.96]
| redshift         | Range of values | [0.5, 0.9]
| redshift_err     | Range of values | [0.000225, 0.000516]
| redshift_warning | List of values  | [0, 3, 5]
| sparcl_id               | List of values (but not<br>intended for data discovery) | ['00001658-460c-4da1-987d-e493d8c9b89b',<br>'000017b6-56a2-4f87-8828-3a3409ba1083']
| site             | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) |  ['apo', 'kpno']
| specid           | List of values | [6988698046080241664, 6971782884823945216]
| spectype         | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) | ['GALAXY', 'STAR', 'QSO']
| specprimary      | List of values (but typically<br>would only include 1 if<br>being used for data<br>discovery constraints) | [1]
| targetid         | List of values | [1237679502171374316, 1237678619584692841]
| telescope        | List of allowed values<br>from [SPARCL Categoricals](https://astrosparcl.datalab.noirlab.edu/sparc/cats/) | ['sloan25m', 'kp4m']
| wavemin          | Range of values | [3607, 3608]
| wavemax          | Range of values | [10363, 10364]

#### Define the fields we want returned (`outfields`) and the constraints (`constraints`):

In [None]:
out = ['sparcl_id', 'specid', 'data_release', 'spectype', 'ra', 'dec', 'redshift']
cons = {'data_release': ['DESI-DR1'],
        'spectype': ['GALAXY'],
        'redshift': [0.1, 0.3]}

#### Execute the `client.find()` method with our parameters:
The `limit` argument here is being used for demonstration purposes only, and simply returns only the first 20 results here.

In [None]:
found = client.find(outfields=out, constraints=cons, limit=20)

# Convert to a dataframe
df_found = pd.json_normalize(found.records)

# Print length and a few example rows
print('N results:', len(df_found))
print('N(unique specids):', len(np.unique(df_found['specid'])))

df_found[:3]

<a class="anchor" id="retrieve"></a>
## Retrieve records by `sparcl_id` using `client.retrieve()`
In order to retrieve spectra records from SPARCL by `sparcl_id`, pass the following to the `client.retrieve()` method:

`uuid_list` : List of SPARCL IDs.  
`dataset_list` : List of data sets to search for the SPARCL IDs in (default: None, which will search all available data sets).  
`include` : List of field names to include in each record (default: 'DEFAULT').  
`limit` : Maximum number of records to return (default: 500). Max allowed is 24,000.


**NOTE: A reasonable amount of records to request retrieval of is about 10,000. Exceeding this value may cause the retrieval to timeout or fail.**

#### Use the sparcl_ids from the output of using `client.find()` to retrieve records from SPARCL:
Note that `ids` in `found_I.ids` is a property name of the Found class. It is a list of records from all records, not a field name of a record.

In [None]:
# Define the fields to include in the retrieve function
inc = ['sparcl_id', 'specid', 'data_release', 'spectype', 'ra', 'dec', 'redshift', 'survey', 'program', 'specprimary', 
       'flux', 'wavelength', 'model', 'ivar', 'mask']

In [None]:
sparcl_ids = found.ids
results = client.retrieve(uuid_list=sparcl_ids, dataset_list=['DESI-DR1'], include=inc)
results.info

<a class="anchor" id="retrieve_specid"></a>
## Retrieve records by `specid` using `client.retrieve_by_specid()`
In order to retrieve spectra records from SPARCL by `specid`, pass the following to the `client.retrieve_by_specid()` method:

`specid_list` : List of specIDs.  
`dataset_list` : List of data sets to search for the specIDs in (default: None, which will search all available data sets).  
`include` : List of field names to include in each record (default: 'DEFAULT').  
`limit` : Maximum number of records to return (default: 500). Max allowed is 24,000.


**NOTE: A reasonable amount of records to request retrieval of is about 10,000. Exceeding this value may cause the retrieval to timeout or fail.**

#### Use the specIDs from the output of using `client.find()` to retrieve records from SPARCL:

In [None]:
# Define the fields to include in the retrieve_by_specid function
inc = ['sparcl_id', 'specid', 'data_release', 'spectype', 'ra', 'dec', 'redshift',
       'flux', 'wavelength', 'model', 'ivar', 'mask']

In [None]:
specids = [f.specid for f in found.records]
results_specids = client.retrieve_by_specid(specid_list=specids, dataset_list=['DESI-DR1'], include=inc)
results_specids.info

<a class="anchor" id="access"></a>
## Access fields in records
You can access the fields from records by using dot notation and dictionary indexing.

#### Accessing a record from our example using `client.find()` and `client.retrieve()`:

In [None]:
records = results.records[0]

sparcl_id = records.sparcl_id
specid = records.specid
spectype = records.spectype
redshift = records.redshift
flux = records.flux
wavelength = records.wavelength
model = records.model
ivar = records.ivar
mask = records.mask

<a class="anchor" id="convert"></a>
## Convert retrieved output to Pandas DataFrame or Spectrum1D object

#### Pandas DataFrame:

In [None]:
df = pd.json_normalize(results_specids.records)

print(len(df))

# Check the first few rows
df[:2]

#### Spectrum1D:
When applicable, the units for each field are documented on the [Fields tab of the astrosparcl website](https://astrosparcl.datalab.noirlab.edu/sparc/sfc/)

In [None]:
## Get one record for the wl
wl = df.wavelength[0]

## Alternatively, could define manually for DESI as it's a common wave grid
#wl = np.arange(3600, 9824.8, 0.8)
wl

In [None]:
## Convert to a Spectrum1D object
# turning the dataframe into Nspec x Npix arrays for flux, ivar, mask
ivar = u.Quantity(np.stack(df.ivar), u.Unit('1e34 erg-2 cm4 s2 AA2'))
specs = Spectrum1D(spectral_axis = wl*u.AA,
                   flux = u.Quantity(np.stack(df.flux), u.Unit('1e-17 erg cm-2 s-1 AA-1')),
                   uncertainty = InverseVariance(ivar),
                   redshift = df.redshift, 
                   mask = np.stack(df['mask']), 
                   meta = {'ra': df.ra, 'dec': df.dec, 'z': df.redshift})

# Added a few columns in the meta attribute (including the redshift in case one wants 
# to save to a FITS file as the redshift attribute does not appear to get saved)

In [None]:
# Plot a couple
f, ax = plt.subplots()  
ax.step(specs[0].spectral_axis, specs[0].flux) 
ax.step(specs[1].spectral_axis, specs[1].flux)
plt.show()

### Exercise

Plot the S/N-per-pixel using the flux and ivar from Spectrum1D object. This can be done for one or two example objects such as the ones plotted above.

<a class="anchor" id="plot"></a>
## Plot spectra

Function using the SPARCL records as an input. **Exercise**: one can create a similar function to take an input dataframe or Spectrum1D object.

In [None]:
def plot_spec(index, res):
    """
    Pass an index value and the output from using client.retrieve()
    to plot the spectrum at the specified index.
    """
    
    record = res.records[index]

    sparcl_id = record.sparcl_id
    data_release = record.data_release
    spectype = record.spectype
    flux = record.flux
    wavelength = record.wavelength
    model = record.model
    mask = record.mask

    # Round the redshift, RA, Dec to a reasonable number of significant digits
    redshift = np.round(record.redshift, 4)
    ra = np.round(record.ra, 7)
    dec = np.round(record.dec, 7)

    # Define unmasked pixels as the valid range
    valid = mask==0

    plt.title(f"SPARCL ID = {sparcl_id}\n"
              f"Data Set = {data_release}\n"
              f"Type = {spectype}\n"
              f"RA = {ra}\n"
              f"Dec = {dec}\n"
              f"Redshift = {redshift}\n", loc='left')
    plt.xlabel('$\lambda\ [\AA]$')
    plt.ylabel('$f_{\lambda}$ $(10^{-17}$ $erg$ $s^{-1}$ $cm^{-2}$ $\AA^{-1})$')
    
    # Plot unsmoothed spectrum in grey
    plt.plot(wavelength, flux, color='k', alpha=0.2, label='Unsmoothed spectrum')
    
    # Overplot spectrum smoothed using a 1-D Gaussian Kernel in black
    plt.plot(wavelength[valid], convolve(flux[valid], Gaussian1DKernel(5)), color='k', label='Smoothed spectrum')
    
    # Overplot the model spectrum in red
    plt.plot(wavelength[valid], model[valid], color='r', label='Model spectrum')
    
    plt.legend()
    plt.show()

In [None]:
plot_spec(index=5, res=results)

### Interactive spectra visualization

Now that you have seen how to create static visualizations of the spectra with the commonly-used matplotlib library, you can explore this notebook that shows [how to obtain spectra with SPARCL and plot them with prospect](https://github.com/astro-datalab/notebooks-latest/blob/master/04_HowTos/SPARCL/Plot_SPARCL_Spectra_with_Prospect.ipynb). The [prospect](https://github.com/desihub/prospect) tool has been used for a number of DESI visual inspection campains and allows users to pan, zoom, adjust the smoothing level and the redshift value. It also features buttons to easily navigate through a pre-loaded set of spectra, and displays some catalog information. 