# Keck HIRES Precision Radial Velocity Data Reduction 

<b><i>This service enables reduction and analysis of precision radial velocity (PRV) data from the HIRES Keck instrument. </i></b>

<i><b><font style="color: red;">This notebook is meant to be a template to set up your own processing using the code snippets provided here. The notebook may not run exactly as presented if you attempt to run the full notebook directly. The code has not been rigorously tested from within a notebook environemnt.</font></b></i>

This notebook introduces the Keck HIRES Precision Radial Velocity (PRV) pipeline service and works through one specific example. There are number of variations, mostly having to do with planning processing, which will be covered in more detail by other notebooks but here you will see all the basics.

The notebook is kept with the HIRES PRV Python access toolkit in <a href="https://github.com/Caltech-IPAC/hiresprv">GitHub: https://github.com/Caltech-IPAC/hiresprv</a>

## Login

Logging in the first time creates a workspace for the user and associates it with a KOA account.

Users of this service must have <a href="https://koa.ipac.caltech.edu">Keck Observatory Archive (KOA)</a> accounts and use that login here to gain access to their data.  Even researchers planning to use only public data will need a KOA login as this service is maintaining persistent storage under that ID.

The login is persisted through the use of HTTP cookies and logging in from multiple clients will connect the user to the same account, storage, and processing history.  This environment (user workspace) is permanent as we expect some on-going research to span years.  The login for a give client machine need only be done once, assuming the cookie file in the local storage is not deleted.  If it is, logging in again will reconstruct it.

The cookie file, processing state information, and downloaded results like 1D spectra and RV curve tables will be kept locally in the same space as this notebook.  If you wish to change that, simply add in whatever directory management and navigation you like.

In [4]:
from IPython.display import IFrame
from hiresprv.auth import login

login('prv.cookies')

KOA userid: koaadmin
KOA Password: ········
Successful login as koaadmin


## KOA Data Retrieval

The PRV workspace first needs to be populated with data from KOA.  This can be done all at once, if the data exists, or incrementally as the data are taken/identified.

This step is more than a simple data transfer.  "Raw reduction" of the data, which converts the 2D CCD echelle images to 1D spectra, is done up-front as the data are retrieved a night at a time.  The UT dates you give here are actually shifted a few hours to catch any calibration data collected in the afternoon of the same (Hawaii-local) day.

In [2]:
from hiresprv.archive import Archive

koa = Archive('prv.cookies')

rtn = koa.by_dates("""2009-12-31
2013-06-29
2013-09-12
2015-06-06""")


{
    "status": "ok",
    "msg": "Processing dates in background."
}





Note that since the data in the workspace are permanent, repeated request for the same data would not change anything and so those dates will be ignored.  Therefore, you can add to the above list or replace it with new dates as you choose.  Dates must be formatted as YYYY-MM-DD.

The above service responds immediately with an acknowledgement of the request and starts the actual transfer and raw data reduction (which can take some time) as a background job.  The job status can be checked by polling or can be monitored using the function below.  While one retrieval job (or processing job below) is running, no others can be initiated.

### PRV Processing Monitor

Some steps in the PRV processing can take quite a long time (hours) and we do not want to tie up this notebook page waiting for it to finish.  Below we show how to retrieve a snapshot of the status (and you would have to poll manually to track the progress of the job) but the preferred approach is to start a real-time monitor in a custom page/tab which uses Javascript and an HTTP Event stream.  Run the next cell to generate a link to this monitor:

In [5]:
from hiresprv.status import Status
from IPython.core.display import HTML

monitor = Status('prv.cookies')

link = monitor.generate_link()

HTML(link)

But if you still want a static status snapshoot, run the following:

In [11]:
from hiresprv.status import Status

monitor = Status('prv.cookies')

url = monitor.processing_status()

IFrame(url, 950, 500)

### Metadata

Once data have been retrieved and the "nightly" raw reduction performed, a set of records is added to a persistent metadata table, one row is added per observation.  These observations are all taken through the HIRES PRV instrument (2D CCD) and will have been reduced to 1D spectra by the raw reduction.  They fall into five classes:

<ul>
<li><b>RV observations</b> -- Multiple observations of a star with the iodine cell in the light path.  Precision, relative radial velocities are calculated for this type of observation. <p/></li>
    
<li><b>Templates</b> -- One long observation of the same star without iodine, for reference.<p/></li>
    
<li><b>B stars</b> -- A set of observations of rapidly rotating B stars bracketing the template observation and used to reduce it.<p/></li>
    
<li><b>Iodine</b> -- Reference observation of iodine for nightly calibration.<p/></li>
    
<li>Miscellaneous other calibration observations (labelled as "<b>Unknown</b>").<p/></li>
</ul>

By inspecting this table, the user can determine what objects were observed, whether there are template observations for them (and adequate B star data to reduce a template), and whether there are enough RV measurements to generate a final RV curve.

With a small metadata table, this is simple enough to do by inspection but a typical workspace can easily have thousands of files covering tens or hundreds of objects.  Furthermore, since observations for a single object are frequently spread out over years, the metadata table is often fairly thorougly mixed in time.

Luckily, there are a number of tools available in client-side Python subset and organize the metadata, so we provide it for download as a CSV table or an SQLite binary file or even, as here, as a simple HTML table.  The workspace copy of the data is maintained in an SQLite database so we also provide a basic filtering mechanism as an optional addition to the download.  This filtering is often adequate for basic processing scenarios.  

Note that metadata retrieval can't be done while the system is "busy" (downloading additional data or further reducing data in the workspace). Otherwise, metadata downloads can be done at any time.

Also note that the client-side file will become out-of-date once new data download or processing requests are submitted.  It is up to the user to re-request the new metadata.


In [7]:
from hiresprv.database import Database

state = Database('prv.cookies')

url = state.search()

IFrame(url, 950,  500)

## Reducing RV Measurements for a Star

### Subsetting the Metadata: Single Target

Ultimately, to make an RV curve for one star we need to reduce its observations into RV measurements.  Assuming there are adequate B star observations to reduce the template, we can isolate appropriate records in the metadata above by simply filtering on TARGET name.  There are many ways to do this; in our case we we used the remote SQLite query capability and filtered it with <p/>

<tt>select DATE, OBTYPE, FILENAME, TARGET, BJD, BCVEL from FILES where TARGET like 'HD185144';</tt> 

The resulting records are shown below.

In [8]:
url = state.search(sql="select DATE, OBTYPE, FILENAME, TARGET, BJD, BCVEL from FILES where TARGET like 'HD185144';")

IFrame(url, 700,  325)

### Templates and B-stars

Another subset that comes up is matching B-Star observations with the template observations they will be used with.  This can be many to many so the easiest quick look is just to list out all B-star and Template observations in time order and then match visually:<p/>

<tt>select DATE, OBTYPE, FILENAME, TARGET, BJD from FILES 
    where OBTYPE like 'TEMPLATE' or OBTYPE like 'B Star';</tt> 

The resulting records are shown below.

In [9]:
url = state.search(sql="select DATE, OBTYPE, FILENAME, TARGET, BJD from FILES where OBTYPE like 'TEMPLATE' or OBTYPE like 'B Star';")

IFrame(url, 700,  325)

### RV Pipeline Processing

This shows that on 12/31/2009 three separate RV observations were made of HD185144 followed by five template observations (which the pipeline will combine into a single template).  Five years later, another three RV observations were made.

As with the data download, the further reduction steps in the pipeline can be quite lengthy (minutes to hours each), so rather than have the user monitor each one, we provide a scripting mechanism so complex reduction jobs can be submitted in on shot.

In order to turn any of the RV observations into an RV value, we need the template.  So we will generate that first.  Since it is possible to repeat the template observations on more than one day, we need to explicitly state which object and which day.  The script command for this is:

<pre>template 185144 20091231</pre>

To reduce an RV measurement, we have to refer to this template (the target name is enough) and specify which file to reduce.  For example:

<pre>rv 185144 r20091231.7</pre>

Finally, once we have a set of RV measurements for an object, we a generate an RV curve (the pipeline finds all the appropriate RV measurements):

<pre>rvcurve 185144</pre>

As long as we follow the general rules that we need a template before we can reduce an RV measurement and we need at least three RV measurements before we can generate an RV curve, we can otherwise scripte things in whatever order we wish (<i>e.g.</i> all the templates first).

All of this is submitted to the pipeline as a text script:



In [10]:
from hiresprv.idldriver import Idldriver

idl = Idldriver('prv.cookies')

rtn = idl.run_script("""
template 185144 20091231
rv 185144 r20091231.72
rv 185144 r20091231.73
rv 185144 r20091231.74
rv 185144 r20150606.145
rv 185144 r20150606.146
rv 185144 r20150606.147
rvcurve 185144
""")

print(rtn)

status= ok
msg= Script running in background. Consult monitor for status.
None


### Monitoring (again)

To monitor the pipeline processing request, the best idea is to use the same monitor page from above.  It stops whenever a given script is finished but you can restart it any time to see the currently-running job.  You can also insert a monitor start-up or status polling call here as well. 


### Product Retrieval

There is a utility function for retrieving the RV curves (CSV files) for each target
(similarly, there is a function -- data.spectrum -- for retrieving the 1D FITS spectrum files).

In [5]:
from hiresprv.download import Download

data = Download('prv.cookies')

rtn = data.rvcurve('185144')

with open('vst185144.csv', 'r') as file:
  for line in file:
    print(line, end='')


BJD_TDB,RV,RV_ERR,BC,ADU,CHI2
15196.69208800001,-2.860674413602231,0.789710,-4620.095214843750,52362,1.05080
15196.69270200003,1.730315543645411,0.812607,-4620.210937500000,51591,1.04891
15196.69329199987,1.431450932849171,0.803164,-4620.320800781250,48950,1.05804
17180.10972899990,-2.845576383390323,0.890091,3189.794921875000,55029,1.10542
17180.11030799989,1.235438116773508,0.825978,3189.327880859375,55717,1.10771
17180.11088699987,1.047083913731319,0.828699,3188.863037109375,48769,1.10120


## Ancillary Tools

### Workspace Directory Listing

We can get a list of all the downloadable files in the workspace.  There are utility functions (below) for quickly downloading the final products but we often want to see the intermediate products to evaluate/diagnose the processing.

In [7]:
import json
from hiresprv.download import Download

data = Download('prv.cookies')

listing = data.directory_listing()

print(json.dumps(listing, indent=4, sort_keys=True))

{
    "database": "prvState.db",
    "deblazed": [
        "deblazed/r20091231.232.fits",
        "deblazed/r20150606.140.fits",
        "deblazed/r20091231.8.fits",
        "deblazed/r20091231.177.fits",
        "deblazed/r20150606.79.fits",
        "deblazed/r20091231.146.fits",
        "deblazed/r20091231.89.fits",
        "deblazed/r20091231.248.fits",
        "deblazed/r20091231.203.fits",
        "deblazed/r20150606.90.fits",
        "deblazed/r20150606.4.fits",
        "deblazed/r20091231.124.fits",
        "deblazed/r20091231.288.fits",
        "deblazed/r20150606.158.fits",
        "deblazed/r20091231.186.fits",
        "deblazed/r20091231.261.fits",
        "deblazed/r20150606.113.fits",
        "deblazed/r20150606.122.fits",
        "deblazed/r20091231.250.fits",
        "deblazed/r20091231.78.fits",
        "deblazed/r20150606.88.fits",
        "deblazed/r20150606.61.fits",
        "deblazed/r20091231.115.fits",
        "deblazed/r20091231.91.fits",
        "deblazed/r20091

### File Download

Any of the files in the listing can be retrieved by name.

In [2]:
from hiresprv.download import Download

data = Download('prv.cookies')

rtn = data.download('deblazed/r20091231.238.fits', 'deblazed_example.fits')

print(rtn)

{'status': 'ok', 'msg': ''}


## Augmented Processing

There are no tunable parameters associated with the HIRES PRV pipeline.  However, you can affect the processing in a couple of places by "removing" files (usually because there is something suspect about them.  For instance, a bad B-star file can adversely affect the stellar template calculation so you man want to rerun the template building without it.  We don't actually delete the file; we deactivate it but leave it in place in case you change your mind.

Similarly, in making the final RV curve you may decide to remove one or more of the reduced RV measurements.


### Deactivating Files

Sometimes it turns out that a data file is suboptimal and should probably be removed from the processing.  It might be a B-star observation bracketing a template measurement or one of the RV observation that should be removed from an RV curve.

The PRV service takes responsibility for remembering what files have been processed and for providing a mechanism (the "DEACTIVATED" column in the database) for "turning off" specific files.  It does not automatically redo all the affected downstream processing.  That is left up to the user so it is best to take care of all of that as soon as possible.

In this example, we will turn off one of the B-star observations. This requires regenerating the template(s) it applies:


In [14]:
from hiresprv.idldriver import Idldriver

idl = Idldriver('prv.cookies')

rtn = idl.run_script("""
deactivate r20091231.77
template 185144 20091231
""")

print(rtn)

status= ok
msg= Script running in background. Consult monitor for status.
None


Having regenerated the template, we must therefore regenerate the downstream data (RVs and RV curve).  The underlying processing code is smart enough to not regenerate the reduced RV data if that file already exists so we need to first remove the reduced RV files:

In [4]:
from hiresprv.idldriver import Idldriver

idl = Idldriver('prv.cookies')

rtn = idl.run_script("""
deactivate vdaa185144_r20091231.72
deactivate vdaa185144_r20091231.73
deactivate vdaa185144_r20091231.74
deactivate vdaa185144_r20150606.145
deactivate vdaa185144_r20150606.146
deactivate vdaa185144_r20150606.147
rv 185144 r20091231.72
rv 185144 r20091231.73
rv 185144 r20091231.74
rv 185144 r20150606.145
rv 185144 r20150606.146
rv 185144 r20150606.147
rvcurve 185144
""")

print(rtn)

status= ok
msg= Script running in background. Consult monitor for status.
None


After further consideration, we decide that the second RV measurement was of inferior quality, so we deactivate it, too.  You can find the name of the reduced RV file by listing out the full database again or by doing a database search (if you look a the file name, it follows a simple pattern):

In [20]:
url = state.search(sql="select DATE, OBTYPE, FILENAME, TARGET, BJD, BCVEL from FILES where TARGET like 'rv data';")

IFrame(url, 700,  325)

In [6]:
from hiresprv.idldriver import Idldriver

idl = Idldriver('prv.cookies')

rtn = idl.run_script("""
deactivate vdaa185144_r20091231.73
rvcurve 185144
""")

print(rtn)

status= ok
msg= Script running in background. Consult monitor for status.
None


So now we have only five reduced RV measurements and they were reduced with a slightly different template measurement.  This is reflected in the RV curve, which you can compare to the original:

In [8]:
from hiresprv.download import Download

data = Download('prv.cookies')

rtn = data.rvcurve('185144')

with open('vst185144.csv', 'r') as file:
  for line in file:
    print(line, end='')


BJD_TDB,RV,RV_ERR,BC,ADU,CHI2
15196.69208800001,-2.176984931266687,0.782362,-4620.095214843750,52362,1.04989
15196.69329199987,2.082068283340939,0.778802,-4620.320800781250,48950,1.05804
17180.10972899990,-2.351030457353758,0.769511,3189.794921875000,55029,1.10542
17180.11030799989,1.241724284090688,0.678055,3189.327880859375,55593,1.10771
17180.11088699987,0.9359797393606732,0.686433,3188.863037109375,48769,1.10120
