# SPLAT Tutorials: Database Query Tools

## Authors
Adam Burgasser

## Version date
22 July 2021

## Learning Goals
* Explore some of the data spreadsheet manipulation tools built into SPLAT (splat.database.prepDB)
* Learn how to use the astroquery wrappers to get source information (splat.database.getPhotometry, splat.database.querySimbad, splat.database.queryXMatch)

## Keywords
astroquery, databases

## Companion Content
None

## Summary
In this tutorial, we are going to see how to use the splat.database functions to manage source spreadsheets and query online databases for source informaiton.


In [1]:
# main splat import
import splat
import splat.database as spdb

# other useful imports
import matplotlib.pyplot as plt
import numpy as np
import pandas
import astropy.units as u

Adding 1051 sources from /Users/adam/projects/splat/Spectra/SPEX_PRISM/ to spectral database
Adding 893 sources from /Users/adam/projects/splat/Spectra/Terrien2015/spectra/ to spectral database
Adding 218 sources from /Users/adam/projects/splat/Spectra/LDSS-3/spectra/ to spectral database
Adding 89 sources from /Users/adam/projects/splat/code/splat//resources/Spectra/Public/MAGE/ to spectral database
Adding 145 sources from /Users/adam/projects/splat/code/splat//resources/Spectra/Public/LRIS-RED/ to spectral database
Adding 2404 sources from /Users/adam/projects/splat/code/splat//resources/Spectra/Public/SPEX-PRISM/ to spectral database
Adding 44 sources from /Users/adam/projects/splat/Spectra/Mann2014/spectra/ to spectral database
Adding 32 sources from /Users/adam/projects/splat/Spectra/LRIS/ to spectral database
Dropped 404 duplicates; 4472 remaining
Could not import regions, which is required for some of the functionalities of this module.


# Prepping datasets

SPLAT useds pandas as its default spreadsheet format. There is a simple tool called prepDB available to manage sets of data to assure one has sufficient informaiton to query online catalogs. We're going to explore a couple of cases based on the targets observed by Terrien et al. (2015), for which there are two .csv files in the SPLAT tutorial directory

In [4]:
# let's start with a folder of RA & DEC
db = pandas.read_csv(splat.SPLAT_PATH+splat.TUTORIAL_FOLDER+'terrien2015_radec.csv')
db

Unnamed: 0,RA,DEC
0,1.6802,-7.5374
1,2.2247,20.8403
2,2.9709,22.9847
3,4.2346,5.1239
4,4.5940,44.0228
...,...,...
346,356.5586,28.4343
347,357.3126,10.0940
348,357.4744,27.3613
349,357.6316,-9.5589


In [6]:
# add in the necessary information for queries with prepDB
# this adds in columns for designation and SkyCoord coordinates
db = spdb.prepDB(db)
db

Unnamed: 0,RA,DEC,DESIGNATION,COORDINATES
0,1.6802,-7.5374,J00064325-0732146,"<SkyCoord (ICRS): (ra, dec) in deg\n (1.680..."
1,2.2247,20.8403,J00085393+2050251,"<SkyCoord (ICRS): (ra, dec) in deg\n (2.224..."
2,2.9709,22.9847,J00115302+2259049,"<SkyCoord (ICRS): (ra, dec) in deg\n (2.970..."
3,4.2346,5.1239,J00165630+0507260,"<SkyCoord (ICRS): (ra, dec) in deg\n (4.234..."
4,4.5940,44.0228,J00182256+4401221,"<SkyCoord (ICRS): (ra, dec) in deg\n (4.594..."
...,...,...,...,...
346,356.5586,28.4343,J23461406+2826035,"<SkyCoord (ICRS): (ra, dec) in deg\n (356.5..."
347,357.3126,10.0940,J23491502+1005384,"<SkyCoord (ICRS): (ra, dec) in deg\n (357.3..."
348,357.4744,27.3613,J23495386+2721407,"<SkyCoord (ICRS): (ra, dec) in deg\n (357.4..."
349,357.6316,-9.5589,J23503158-0933320,"<SkyCoord (ICRS): (ra, dec) in deg\n (357.6..."


In [7]:
# alternately let's assume we have a file that contains only designations
db = pandas.read_csv(splat.SPLAT_PATH+splat.TUTORIAL_FOLDER+'terrien2015_designations.csv')
db

Unnamed: 0,DESIGNATION
0,J00064325-0732147
1,J00085391+2050252
2,J00115302+2259047
3,J00165629+0507261
4,J00182256+4401222
...,...
346,J23461405+2826036
347,J23491501+1005385
348,J23495384+2721406
349,J23503159-0933320


In [8]:
# prepDB will adds in the columns for RA, Dec and SkyCoord coordinates
db = spdb.prepDB(db)
db

Unnamed: 0,DESIGNATION,COORDINATES,RA,DEC
0,J00064325-0732147,"<SkyCoord (ICRS): (ra, dec) in deg\n (1.680...",1.680208,-7.537242
1,J00085391+2050252,"<SkyCoord (ICRS): (ra, dec) in deg\n (2.224...",2.224625,20.840283
2,J00115302+2259047,"<SkyCoord (ICRS): (ra, dec) in deg\n (2.970...",2.970917,22.984464
3,J00165629+0507261,"<SkyCoord (ICRS): (ra, dec) in deg\n (4.234...",4.234542,5.123892
4,J00182256+4401222,"<SkyCoord (ICRS): (ra, dec) in deg\n (4.594...",4.594000,44.022783
...,...,...,...,...
346,J23461405+2826036,"<SkyCoord (ICRS): (ra, dec) in deg\n (356.5...",356.558542,28.434183
347,J23491501+1005385,"<SkyCoord (ICRS): (ra, dec) in deg\n (357.3...",357.312542,10.093903
348,J23495384+2721406,"<SkyCoord (ICRS): (ra, dec) in deg\n (357.4...",357.474333,27.361128
349,J23503159-0933320,"<SkyCoord (ICRS): (ra, dec) in deg\n (357.6...",357.631625,-9.558889


# Getting photometry with getPhotometry

The ``splat.database.getPhotometry()`` is a wrapper for astroquery.Vizier, allowing you to query the Vizier network of catalogs to find relevant photometry and other information, as well as SIMBAD. This code is particularly well suited for searching on source at a time; for a large number of sources it is probably better to use ``splat.database.queryXMatch()``. To start, let's find 2MASS, SDSS and WISE data for one 

In [None]:
# selecting by spectral type range and signal-to-noise (value given is minimum S/N)
dp = splat.searchLibrary(spt=['L5','L8'],snr=50)
dp

In [None]:
# selecting by OPTICAL spectral type range and signal-to-noise (value given is minimum S/N)
dp = splat.searchLibrary(opt_spt=['L5','L8'],snr=50)
dp

In [None]:
# select young L dwarfs
dp = splat.searchLibrary(opt_spt=['L0','L9'],young=True)
dp

In [None]:
# select metal-poor L dwarfs
dp = splat.searchLibrary(opt_spt=['L0','L9'],subdwarf=True)
dp

In [None]:
# select giants
dp = splat.searchLibrary(giant=True)
dp

# Reading in the spectra

Once you've identified the spectra you want, you can read them in based on the spreadsheet info or splat.getSpectrum(). Be sure you have a manageable list!

In [None]:
# select metal-poor L dwarfs
# then read in using the data key
dp = splat.searchLibrary(opt_spt=['L0','L9'],subdwarf=True)
splist = []
for i in dp['DATA_KEY']:
    splist.append(splat.Spectrum(i))
    print('Read in spectrum of {}'.format(splist[-1].name))
splist

In [None]:
# do the same but read in by filename
dp = splat.searchLibrary(opt_spt=['L0','L9'],subdwarf=True)
splist = []
for f in dp['DATA_FILE']:
    splist.append(splat.Spectrum(file=f))
    print('Read in spectrum of {}'.format(splist[-1].name))
splist

In [None]:
# the same syntax can be used to read in a list of spectra using splat.getSpectrum()
splist = splat.getSpectrum(opt_spt=['L0','L9'],subdwarf=True)
splist

# Measurements on samples of spectra

We can add measurements to the pandas spreadsheet created by searchLibrary(), a convenient way to manage and save analyses

In [None]:
# let's measure the classifications of our sources
dp = splat.searchLibrary(opt_spt=['L0','L9'],subdwarf=True)
dp['SPEX_SPT'] = ['']*len(dp)
# note the use of enumerate here
for i,f in enumerate(dp['DATA_FILE']):
    sp = splat.Spectrum(file=f)
    spt,spt_e = splat.classifyByStandard(sp,method='kirkpatrick')
    dp['SPEX_SPT'].iloc[i] = spt
dp['SPEX_SPT']

In [None]:
# another way of doing this
dp = splat.searchLibrary(opt_spt=['L0','L9'],subdwarf=True)
spts = []
# note the use of enumerate here
for i,f in enumerate(dp['DATA_FILE']):
    sp = splat.Spectrum(file=f)
    spts.append(splat.classifyByStandard(sp,method='kirkpatrick')[0])
dp['SPEX_SPT'] = spts
dp['SPEX_SPT']

In [None]:
# here's how you can measure many indices on the spectra and store them to your pandas dataframe
dp = splat.searchLibrary(opt_spt=['L0','L9'],subdwarf=True)

# first figure out what indices we're measuring
# the names of the indices are in the keys
sp = splat.Spectrum(file=dp['DATA_FILE'].iloc[0])
ind = splat.measureIndexSet(sp)
indices = ind.keys()

# add these to the dataframe
for i in indices: dp[i] = np.zeros(len(dp))
    
# now measure all of the spectra
for i,f in enumerate(dp['DATA_FILE']):
    sp = splat.Spectrum(file=f)
    ind = splat.measureIndexSet(sp)
    for indname in indices: dp[indname].iloc[i]=ind[indname][0]

# print out the values you've measureed
dp[indices]


# Plotting batches of spectra

Here's some examples of plotting samples of spectra using either plotSpectrum() or plotBatch(); you can see more examples at this page: https://spl-toolkit.readthedocs.io/en/latest/splat_plot/ 

In [None]:
# learn more about these functions
splot.plotSpectrum?

In [None]:
# learn more about these functions
splot.plotBatch?

In [None]:
# read in batch of spectra
splist = splat.getSpectrum(opt_spt=['L0','L9'],subdwarf=True)

In [None]:
# now plot them all using plotSpectrum with the multiplot option
splot.plotSpectrum(splist,multiplot=True)

In [None]:
# let's clean this up a bit by making a 2x2 grid
splot.plotSpectrum(splist,multiplot=True,layout=[2,2])

In [None]:
# the normalization is not so great here, so lets first normalize the spectra in a certain range
# and then set the y-axis range
for sp in splist: sp.normalize([0.9,1.4])
splot.plotSpectrum(splist,multiplot=True,layout=[2,2],yrange=[-0.05,1.2])

In [None]:
# now let's add some details, including the legend giving the name of the source
# and labeling L dwarf features; we'll also save this out as a multi-page pdf file
names = [sp.name for sp in splist]
splot.plotSpectrum(splist,multiplot=True,layout=[2,2],yrange=[-0.05,1.2],legend=names,features=['h2o','feh','co'],telluric=True,grid=True,multipage=True,file='myplot.pdf')


In [None]:
# plotBatch does many of these tasks in a compact way; here's the baseline call
splot.plotBatch(splist)


In [None]:
# now with the same options as before
# NOTE: ignore the warning messages here
splot.plotBatch(splist,features=['h2o','feh','co'],telluric=True,grid=True,yrange=[-0.05,1.2],output='myplot.pdf')


In [None]:
# plotBatch has a nice feature in that it can automatically classify spectra
# NOTE: the scaling on this doesn't seem to be working properly right now!
splot.plotBatch(splist,classify=True,normalize=True)


In [None]:
# here's an example of comparing all of our sources to one particular comparison source, the sdL0.0 standard
# The subdwarf standards are contained in the splat.STDS_SD_SPEX variable
comptype = 'sdL0.0'
spcomp = splat.STDS_SD_SPEX[comptype]
spcomp.normalize([0.9,1.4])
names = ['{} vs {}'.format(sp.name,comptype) for sp in splist]

splot.plotSpectrum(splist,multiplot=True,layout=[2,2],yrange=[-0.05,1.2],legend=names,comparison=spcomp,colorComparison='r')
