# RRISA Quickstart

All of the IGRINS data is stored using Box. We've generated all of the download links you'll need to access both raw and reduced IGRINS data products.

In this tutorial we walk through how to import the RRISA files into pandas DataFrame objects, manipulate DataFrames in various ways to find data subsets, and how to download IGRINS data products using Python.

In [1]:
#only need one package
import pandas as pd

### Reading in a RRISA file

pandas has a handy function (read_csv) that can easily interpret .csv files and turn them into a DataFrame object. DataFrames have a similar format to a spreadsheet, but can be parsed using Python

In [2]:
#read in the cross matched superlog
xmatch_superlog = pd.read_csv('../RRISA_XMatch/xmatch_log.csv')

In [3]:
#print the cross matched superlog Dataframe
# .head() just shows the first 5 rows in the DataFrame
xmatch_superlog.head()

Unnamed: 0,NAME,OBJNAME_super,OBJNAME_recipe,MAIN_ID,CIVIL,RA,DEC,FILENAME,FILES,SNRH_pix,...,APOGEE2_[M/H],APOGEE2_[a/M],APOGEE2_[Fe/H],PASTEL_Teff,PASTEL_logg,PASTEL_[Fe/H],PASTEL_Flag,FILE_URL,CAL_URL,RAW_URL
0,* mu.02 Her,LHS 3325,LHS 3325,* mu.02 Her,20140707,266.771375,27.727028,SDCH_20140707_0095.fits,95 96 97 98,231.749786,...,,,,,,,,https://utexas.box.com/shared/static/8roandzn8...,https://utexas.box.com/shared/static/hodoh169z...,https://utexas.box.com/s/p5esie4bqbecxlnv7tm6k...
1,* alf Lyr,vega,vega,* alf Lyr,20140707,279.393667,38.804667,SDCH_20140707_0099.fits,99 100 101 102 103 104 105 106 107 108 109 110,551.093567,...,,,,0.0,,-0.3,,https://utexas.box.com/shared/static/eb5fsr3ci...,https://utexas.box.com/shared/static/hodoh169z...,https://utexas.box.com/s/p5esie4bqbecxlnv7tm6k...
2,HD 164595B,NLTT45791,NLTT45791,HD 164595B,20140707,270.353292,29.581639,SDCH_20140707_0111.fits,111 112 113 114,60.458027,...,,,,0.0,,-0.07,,https://utexas.box.com/shared/static/mccb93th9...,https://utexas.box.com/shared/static/hodoh169z...,https://utexas.box.com/s/p5esie4bqbecxlnv7tm6k...
3,NAME BD+31 3330BC,NLTT46858,NLTT46858,NAME BD+31 3330BC,20140707,280.389208,31.558806,SDCH_20140707_0116.fits,116 117 118 119,124.143341,...,,,,,,,,https://utexas.box.com/shared/static/ysd40crbh...,https://utexas.box.com/shared/static/hodoh169z...,https://utexas.box.com/s/p5esie4bqbecxlnv7tm6k...
4,L 1288-4,GJ797B,GJ797B,L 1288-4,20140707,310.365,19.958444,SDCH_20140707_0124.fits,124 125 126 127 128 129 130 131,106.686111,...,,,,0.0,,-0.07,,https://utexas.box.com/shared/static/jk0ajudat...,https://utexas.box.com/shared/static/hodoh169z...,https://utexas.box.com/s/p5esie4bqbecxlnv7tm6k...


### DataFrame Manipulation

Now we can use the DataFrame to look for specific targets! We can start by looking at all of the columns avalible to us:

In [4]:
xmatch_superlog.columns

Index(['NAME', 'OBJNAME_super', 'OBJNAME_recipe', 'MAIN_ID', 'CIVIL', 'RA',
       'DEC', 'FILENAME', 'FILES', 'SNRH_pix', 'SNRH_res', 'SNRK_pix',
       'SNRK_res', 'FILENUMBER', 'STANDARD', 'JD', 'OBJTYPE', 'EXPTIME',
       'ROTPA', 'AM', 'BVC', 'FACILITY', 'PI', 'PROGID', 'RA_s', 'DEC_s',
       'IDS', 'OTYPE', 'SP_TYPE', 'SP_BIBCODE', 'PMRA', 'PMDEC', 'PM_BIBCODE',
       'RV_VALUE', 'RV_BIBCODE', 'PLX_VALUE', 'PLX_BIBCODE', 'U', 'B', 'V',
       'R', 'G', 'I', 'J', 'H', 'K', '2MASS_ID', '2MASS_J', '2MASS_H',
       '2MASS_K', '2MASS_Flag', 'GaiaDR3_source', 'GaiaDR3_parallax',
       'GaiaDR3_pm', 'GaiaDR3_bprp', 'GaiaDR3_ebprp', 'GaiaDR3_gmag',
       'GaiaDR3_RV', 'GaiaDR3_teff', 'GaiaDR3_logg', 'GaiaDR3_FeH',
       'GaiaDR3_dist', 'GaiaEDR3_Flag', 'APOGEE2_HRV', 'APOGEE2_teff',
       'APOGEE2_logg', 'APOGEE2_Vsini', 'APOGEE2_[M/H]', 'APOGEE2_[a/M]',
       'APOGEE2_[Fe/H]', 'PASTEL_Teff', 'PASTEL_logg', 'PASTEL_[Fe/H]',
       'PASTEL_Flag', 'FILE_URL', 'CAL_URL', 'RAW_URL']

Say we only want the highest H band signal to noise spectra for each object, we can start by sorting the DataFrame using the 'SNRH' column from highest to lowest.

In [6]:
#here I pick the highest SNRH per pixel, but you can sort per resolution element too
#the inplace = True argument here permanently alters the order of the data frame!
xmatch_superlog.sort_values(by=['SNRH_pix'], ascending = False, inplace = True)
xmatch_superlog.head()

Unnamed: 0,NAME,OBJNAME_super,OBJNAME_recipe,MAIN_ID,CIVIL,RA,DEC,FILENAME,FILES,SNRH_pix,...,APOGEE2_[M/H],APOGEE2_[a/M],APOGEE2_[Fe/H],PASTEL_Teff,PASTEL_logg,PASTEL_[Fe/H],PASTEL_Flag,FILE_URL,CAL_URL,RAW_URL
3393,* rho Ser,HR 5899,HR 5899,* rho Ser,20160716,237.992167,20.924472,SDCH_20160716_0059.fits,59 60 61 62,1190.256226,...,,,,3920.0,1.68,-0.17,,https://utexas.box.com/shared/static/ij9xprhxr...,https://utexas.box.com/shared/static/3vlrm5asw...,https://utexas.box.com/s/e0abxa6nxt56j4u9kuiln...
3398,HD 198345,HR 7969,HR 7969,HD 198345,20160716,,,SDCH_20160716_0099.fits,99 100 101 102 103 104 105 106,1046.151855,...,,,,4010.0,1.78,-0.23,,https://utexas.box.com/shared/static/cynknyq6i...,https://utexas.box.com/shared/static/3vlrm5asw...,https://utexas.box.com/s/e0abxa6nxt56j4u9kuiln...
13623,L 98-59,L 98-59,L 98-59,L 98-59,20220104,124.532342,-68.316147,SDCH_20220104_0030.fits,30 29 31 32 33 34 35 36 37 38 39 40 41 42 43 4...,1026.182617,...,,,,,,,,https://utexas.box.com/shared/static/15easfc0z...,https://utexas.box.com/shared/static/tsaylqnyy...,https://utexas.box.com/s/fv1b0jr466zlhzuwwvjlk...
3387,HD 194193,HR 7800,HR 7800,HD 194193,20160714,,,SDCH_20160714_0079.fits,79 80 81 82 83 84 85 86,1003.116943,...,,,,,,,,https://utexas.box.com/shared/static/4ufwudxdy...,https://utexas.box.com/shared/static/ibk58bg5a...,https://utexas.box.com/s/fst762i8klw7uwe4lz2qd...
3399,V* T Cyg,HR 7956,HR 7956,V* T Cyg,20160716,311.950667,34.423806,SDCH_20160716_0107.fits,107 108 109 110 111 112 113 114,939.502686,...,,,,4190.0,2.12,-0.12,,https://utexas.box.com/shared/static/kmhouriy5...,https://utexas.box.com/shared/static/3vlrm5asw...,https://utexas.box.com/s/e0abxa6nxt56j4u9kuiln...


Then we can drop all of the rows that have a repeat name, if we keep the first occurance of the name only then we will get the highest SNR H band spectra for each object!

In [8]:
#here we are using the SIMBAD MAIN_ID column because the IGRINS log names may be different for the same object
#again inplace = True permanently changes the dataframe
xmatch_superlog.drop_duplicates(subset = ['MAIN_ID'], keep = 'first', inplace = True)
xmatch_superlog.head()

Unnamed: 0,NAME,OBJNAME_super,OBJNAME_recipe,MAIN_ID,CIVIL,RA,DEC,FILENAME,FILES,SNRH_pix,...,APOGEE2_[M/H],APOGEE2_[a/M],APOGEE2_[Fe/H],PASTEL_Teff,PASTEL_logg,PASTEL_[Fe/H],PASTEL_Flag,FILE_URL,CAL_URL,RAW_URL
3393,* rho Ser,HR 5899,HR 5899,* rho Ser,20160716,237.992167,20.924472,SDCH_20160716_0059.fits,59 60 61 62,1190.256226,...,,,,3920.0,1.68,-0.17,,https://utexas.box.com/shared/static/ij9xprhxr...,https://utexas.box.com/shared/static/3vlrm5asw...,https://utexas.box.com/s/e0abxa6nxt56j4u9kuiln...
3398,HD 198345,HR 7969,HR 7969,HD 198345,20160716,,,SDCH_20160716_0099.fits,99 100 101 102 103 104 105 106,1046.151855,...,,,,4010.0,1.78,-0.23,,https://utexas.box.com/shared/static/cynknyq6i...,https://utexas.box.com/shared/static/3vlrm5asw...,https://utexas.box.com/s/e0abxa6nxt56j4u9kuiln...
13623,L 98-59,L 98-59,L 98-59,L 98-59,20220104,124.532342,-68.316147,SDCH_20220104_0030.fits,30 29 31 32 33 34 35 36 37 38 39 40 41 42 43 4...,1026.182617,...,,,,,,,,https://utexas.box.com/shared/static/15easfc0z...,https://utexas.box.com/shared/static/tsaylqnyy...,https://utexas.box.com/s/fv1b0jr466zlhzuwwvjlk...
3387,HD 194193,HR 7800,HR 7800,HD 194193,20160714,,,SDCH_20160714_0079.fits,79 80 81 82 83 84 85 86,1003.116943,...,,,,,,,,https://utexas.box.com/shared/static/4ufwudxdy...,https://utexas.box.com/shared/static/ibk58bg5a...,https://utexas.box.com/s/fst762i8klw7uwe4lz2qd...
3399,V* T Cyg,HR 7956,HR 7956,V* T Cyg,20160716,311.950667,34.423806,SDCH_20160716_0107.fits,107 108 109 110 111 112 113 114,939.502686,...,,,,4190.0,2.12,-0.12,,https://utexas.box.com/shared/static/kmhouriy5...,https://utexas.box.com/shared/static/3vlrm5asw...,https://utexas.box.com/s/e0abxa6nxt56j4u9kuiln...


If we require the SNR to be higher than a specific value, we can implement that too

In [9]:
subset = xmatch_superlog[xmatch_superlog['SNRH_pix'] >= 150.]
subset.head()

Unnamed: 0,NAME,OBJNAME_super,OBJNAME_recipe,MAIN_ID,CIVIL,RA,DEC,FILENAME,FILES,SNRH_pix,...,APOGEE2_[M/H],APOGEE2_[a/M],APOGEE2_[Fe/H],PASTEL_Teff,PASTEL_logg,PASTEL_[Fe/H],PASTEL_Flag,FILE_URL,CAL_URL,RAW_URL
3393,* rho Ser,HR 5899,HR 5899,* rho Ser,20160716,237.992167,20.924472,SDCH_20160716_0059.fits,59 60 61 62,1190.256226,...,,,,3920.0,1.68,-0.17,,https://utexas.box.com/shared/static/ij9xprhxr...,https://utexas.box.com/shared/static/3vlrm5asw...,https://utexas.box.com/s/e0abxa6nxt56j4u9kuiln...
3398,HD 198345,HR 7969,HR 7969,HD 198345,20160716,,,SDCH_20160716_0099.fits,99 100 101 102 103 104 105 106,1046.151855,...,,,,4010.0,1.78,-0.23,,https://utexas.box.com/shared/static/cynknyq6i...,https://utexas.box.com/shared/static/3vlrm5asw...,https://utexas.box.com/s/e0abxa6nxt56j4u9kuiln...
13623,L 98-59,L 98-59,L 98-59,L 98-59,20220104,124.532342,-68.316147,SDCH_20220104_0030.fits,30 29 31 32 33 34 35 36 37 38 39 40 41 42 43 4...,1026.182617,...,,,,,,,,https://utexas.box.com/shared/static/15easfc0z...,https://utexas.box.com/shared/static/tsaylqnyy...,https://utexas.box.com/s/fv1b0jr466zlhzuwwvjlk...
3387,HD 194193,HR 7800,HR 7800,HD 194193,20160714,,,SDCH_20160714_0079.fits,79 80 81 82 83 84 85 86,1003.116943,...,,,,,,,,https://utexas.box.com/shared/static/4ufwudxdy...,https://utexas.box.com/shared/static/ibk58bg5a...,https://utexas.box.com/s/fst762i8klw7uwe4lz2qd...
3399,V* T Cyg,HR 7956,HR 7956,V* T Cyg,20160716,311.950667,34.423806,SDCH_20160716_0107.fits,107 108 109 110 111 112 113 114,939.502686,...,,,,4190.0,2.12,-0.12,,https://utexas.box.com/shared/static/kmhouriy5...,https://utexas.box.com/shared/static/3vlrm5asw...,https://utexas.box.com/s/e0abxa6nxt56j4u9kuiln...


We can search for substrings within the possible identifiers for objects from SIMBAD to narrow our list further. Here is an example that shows how to to search for the substring "Tau" within the SIMBAD identifiers. 

The IDS column of the dataframe is a long string of all the names SIMBAD associates with an object seperated by "|" as you can see below

In [12]:
#here iloc[0] grabs the first row of the dataframe as it is currently sorted regardless of the index value
#(.loc[0] would select the row that is assigned the index value of 0)
subset['IDS'].iloc[0]

'HIP 77661|Gaia DR3 1216544921044652032|TIC 307915225|PLX 3590|*  38 Ser|* rho Ser|AG+21 1546|BD+21  2829|GC 21311|GCRV  9137|GEN# +1.00141992|GSC 01502-01782|HD 141992|HIC  77661|HR  5899|IRAS 15490+2107|IRC +20286|JP11  2661|NSV  7300|PPM 104370|RAFGL 1803|ROT  2234|SAO  84037|SKY# 28646|TYC 1502-1782-1|UBV   13472|UBV M  20998|USNO 860|YZ  21  5434|2MASS J15511590+2058405|PLX 3590.00|WEB 13142|Gaia DR2 1216544921044244736'

In [14]:
#it is important to convert the Series (subset['IDS']) into a string (.astype(str)) so that way we do not get any errors from NaN values
#note that this is a case sensitive search!
subset_tau = subset[subset['IDS'].astype('str').str.contains('Tau')]
#we can see that the first object in this new dataframe has several identifiers with "Tau" in the name!
subset_tau['IDS'].iloc[0]

'HIP 16322|Gaia DR3 16370186245017600|TIC 416676377|2MASS J03302446+1120111|PLX  732|* s Tau|*   4 Tau|AG+11  336|BD+10   452|GC  4173|GCRV  1931|GEN# +1.00021686|GSC 00653-01366|HD  21686|HIC  16322|HR  1061|IRAS 03276+1109|N30  713|PPM 119132|ROT   516|SAO  93463|SKY#  5285|TD1  2246|TYC  653-1366-1|UBV    3360|UBV M   9540|YZ  11  1027|uvby98 100021686|PLX  732.00|WEB  3118|Gaia DR2 16370186245017600'

Finally, we can select specifically targets with a 'Tau' identifier:

In [18]:
#here we can go back to searching by an exact string ('TAR') since there are few OBJTYPEs avalible!
targets_tau = subset_tau[subset_tau['OBJTYPE'] == 'TAR']
targets_tau.head()

Unnamed: 0,NAME,OBJNAME_super,OBJNAME_recipe,MAIN_ID,CIVIL,RA,DEC,FILENAME,FILES,SNRH_pix,...,APOGEE2_[M/H],APOGEE2_[a/M],APOGEE2_[Fe/H],PASTEL_Teff,PASTEL_logg,PASTEL_[Fe/H],PASTEL_Flag,FILE_URL,CAL_URL,RAW_URL
5407,V* V830 Tau,V830 Tau,V830 Tau,V* V830 Tau,20170915,68.290917,24.56375,SDCH_20170915_0049.fits,49 50 51 52 53 54 55 56,262.543091,...,-0.1387,-0.1077,-0.137,,,,,https://utexas.box.com/shared/static/9fbxh7vn8...,https://utexas.box.com/shared/static/kpffa1nyk...,https://utexas.box.com/s/h82ekluq4wp1t3boqw2yu...
2613,* 21 Tau,HD23432,HD23432,* 21 Tau,20160227,56.695333,24.597806,SDCH_20160227_0041.fits,41 42 43 42 44 45 46 45,250.404816,...,,,,11041.0,,,,https://utexas.box.com/shared/static/nji1wtkia...,https://utexas.box.com/shared/static/ap9pgkpy4...,https://utexas.box.com/s/r63afsdgemg79vughg83p...
10212,HD 32923,HD 32923 (RV Standard),HD 32923 (RV Standard),* m Tau,20201223,76.865542,18.646792,SDCH_20201223_0056.fits,56 57 58 59 60 61 62 63,246.127472,...,,,,5651.0,4.05,-0.22,,https://utexas.box.com/shared/static/g3wzb5y5k...,https://utexas.box.com/shared/static/fcqxx10g5...,https://utexas.box.com/s/2w4mysah4lipl50gvb61s...
4482,HD 28068,Solar Twin HD 28068,Solar Twin HD 28068,V* V906 Tau,20170208,66.600708,16.850694,SDCH_20170208_0043.fits,43 44 45 46 47 48 49 50 51 52 53 54,243.822739,...,,,,5305.0,,0.07,,https://utexas.box.com/shared/static/9geuqk0q4...,https://utexas.box.com/shared/static/8cdwngogy...,https://utexas.box.com/s/79p4x7nvkybmgsz3ra74u...
2614,* h Tau,HD27397,HD27397,* h Tau,20160227,65.196667,14.076056,SDCH_20160227_0047.fits,47 48 49 50,238.391418,...,,,,,,,,https://utexas.box.com/shared/static/tgusy7i1e...,https://utexas.box.com/shared/static/ap9pgkpy4...,https://utexas.box.com/s/r63afsdgemg79vughg83p...


### Downloading Files

In [23]:
#we only need the requests package!
import requests

The FILE_URL column contains a download link to the reduced data zip file for that object. Zipped files include both H and K data products!

In [21]:
#just looking at our list of objects with "Tau" in an identifier from above
targets_tau['FILE_URL']

5407     https://utexas.box.com/shared/static/9fbxh7vn8...
2613     https://utexas.box.com/shared/static/nji1wtkia...
10212    https://utexas.box.com/shared/static/g3wzb5y5k...
4482     https://utexas.box.com/shared/static/9geuqk0q4...
2614     https://utexas.box.com/shared/static/tgusy7i1e...
2233     https://utexas.box.com/shared/static/m5ozzxvhu...
274      https://utexas.box.com/shared/static/oex9t3f8w...
5318     https://utexas.box.com/shared/static/5ql55eqo2...
3621     https://utexas.box.com/shared/static/llmf2uwoh...
2430     https://utexas.box.com/shared/static/mlmrrcjt8...
9697     https://utexas.box.com/shared/static/3a3drdq0j...
842      https://utexas.box.com/shared/static/4pbx3rv43...
2158     https://utexas.box.com/shared/static/ei131p6is...
315      https://utexas.box.com/shared/static/fxetfq637...
8679     https://utexas.box.com/shared/static/5b7evyobn...
6051     https://utexas.box.com/shared/static/sc7wrg5g7...
8292     https://utexas.box.com/shared/static/58ng2jflx.

Here is an example of downloading one file from the above dictionary using the requests package

In [25]:
#we can grab the filename of the file we want to download
#here .split('.') splits the filename at the . so we can remove the .fits extension from the string
file_name = targets_tau['FILENAME'].iloc[0].split('.')[0]

#this is the download URL for the zip file
download_link = targets_tau['FILE_URL'].iloc[0]

#an example of a folder where you might want to put the file
civil_date = targets_tau['CIVIL'].iloc[0]

#using a session here means that requests can fetch the information faster since the connection is reused
#this means you can download multiple files at once using the same session if you want!
session = requests.Session()

#the response holds lots of information gathered from the request (session.get())
response = session.get(download_link)

#we can check the status code to make sure our link was found successfully
if response.status_code == 200:
    #open a file with the corresponding filename to dump the file content into
    #here we are putting the file into a folder with the corresponding civil date of the observation
    #we are also adding the .tar.gz extension to the filename
    with open(f"{civil_date}/{file_name}.tar.gz", 'wb') as f:
        #write the file
        f.write(response.content)
        
    #close the file--especially important when writing many files at once or trying to prevent against corrupted files
    f.close()
    #print that we downloaded the file
    print(f"Downloaded {file_name}.tar.gz to {civil_date} folder!")

Downloaded SDCH_20170915_0049.tar.gz to 20170915 folder!


Going back to our above example of high SNR H band spectra of targets with "Tau" in the name, we can download all of the files using the following handy function [adapted from the muler tutorial for downloading files from UT Box](https://muler.readthedocs.io/en/latest/tutorials/Download_IGRINS_data_from_Box.html). I elect to add some file organization using os.makedirs, the built in open function will not create new directories to put files into so we have to do it ourselves.

In [36]:
import os

'''
downloads files into organized directories

input:
    download_link: the Box download link for the file
    file_number: the file number of the target
    civil: the civil date of observation
    name: the name of the object 
    session: a requests Session object 
'''
def download_files(download_link, file_number, civil, name, session):
    #get the url for the filename
    response = session.get(download_link)
    #make sure the file url was found
    if response.status_code == 200:
        #create the directory for the file, if it is already created there will not be an error
        os.makedirs(os.path.dirname(f"{civil}/{name}/"), exist_ok=True)
        #open the file
        #creating a new name for our downloaded file with the civil date and the file number
        with open(f"{civil}/{name}/{civil}_{str(file_number).zfill(4)}.tar.gz", 'wb') as f:
            #write the information to the file
            f.write(response.content)
        #close the file
        f.close()
        #print that the file was downloaded
        print(f"Downloaded {civil}_{str(file_number).zfill(4)}.tar.gz to the {civil} folder!")

In [37]:
#create the reference session
session = requests.Session()

#for the first five rows in the DataFrame
for idx in targets_tau.index[0:5]:
    #get the civil date of the target
    civil = targets_tau['CIVIL'].loc[idx]
    #get the name of the target
    #here we join any spaces in the name of the target with an '_' and remove any '*' characters from the name
    name = '_'.join(targets_tau['MAIN_ID'].loc[idx].replace('*', '').split())
    #download the tar files
    download_files(targets_tau['FILE_URL'].loc[idx], targets_tau['FILENUMBER'].loc[idx], civil, name, session)

Downloaded 20170915_0049.tar.gz to the 20170915 folder!
Downloaded 20160227_0041.tar.gz to the 20160227 folder!
Downloaded 20201223_0056.tar.gz to the 20201223 folder!
Downloaded 20170208_0043.tar.gz to the 20170208 folder!
Downloaded 20160227_0047.tar.gz to the 20160227 folder!


Since the files are zipped, we can also unzip the files using python. We will just unzip one of the files from our example above

In [40]:
import shutil

civil = targets_tau['CIVIL'].iloc[0]

file_number = targets_tau['FILENUMBER'].iloc[0]

name = '_'.join(targets_tau['MAIN_ID'].iloc[0].replace('*', '').split())

#the unpack_archive function automatically detects the compression format of the zip file
#the second argument in the function is the name for the folder of the unzipped file, I am just using the same as the zipped file name
shutil.unpack_archive(f'{civil}/{name}/{civil}_{str(file_number).zfill(4)}.tar.gz', f'{civil}/{name}/{civil}_{str(file_number).zfill(4)}')