Welcome to HUSTLE-tools Stage 3! In this Stage, we will:

1. Load the data we extracted from in Stage 3.
2. Sum across both orders' 1D spectra to create broadband light curves.
3. Bin the 1D spectra by wavelengths and by columns to create spectral light curves.
4. Save each order's light curves to an xarray file.

Make sure to run the HUSTLE-tools Stage 0, 1, and 2 notebooks before you run this one! This notebook relies on files that have been downloaded from Stage 0, reduced in Stage 1, and extracted from in Stage 2.

To get started, run the next cell, which imports the packages we need and creates a directory for HUSTLE-tools to operate from. Then follow the next markdown cell for instructions on how to execute Stage 3.

In [1]:
import os
from hustle_tools import run_pipeline

config_directory = 'hustle_config'
if not os.path.exists(config_directory):
    os.makedirs(config_directory)

THE STAGE 3 .HUSTLE FILE

HUSTLE-tools operates using configuration files, designated with the .hustle file extension. Each .hustle file controls one Stage of HUSTLE-tools. .hustle files are more human-readable than raw code and they allow you to easily reproduce previous runs of HUSTLE-tools as well as rapidly share your reduction and analysis methods with your colleagues.

Now we need to make our Stage 3 .hustle file. The next cell contains a template stage_3.hustle file which you can modify to run Stage 3 of HUSTLE-tools. We're going to modify it so that it bins the 1D spectra from the HUSTLE program observations of the hot Jupiter WASP-127b that we downloaded in Stage 0, reduced in Stage 1, and extracted from in Stage 2. To make it do this, make the following changes to the .hustle file template:

1. Set topleveldir to 'tutorial' and input_run to the string 'extraction_1' so we can use our extracted spectra from the last Stage.
2. Set output_run to the string 'bin_1' to keep our different extraction attempts separated.
3. Set the verbose, show_plots, and save_plots variables to each be 0, 1, or 2. verbose controls how many printed statements the pipeline produces, letting you keep track of what the pipeline is doing and what step it is on. show_plots allows the pipeline to temporarily pause execution to show you an interactive plot. save_plots saves output .png or .gif files for plots and other graphics produced. 0 prints/shows/saves nothing, while 2 prints/shows/saves everything.
4. Set bin_method to "wavelengths" to bin based on wavelengths rather than on detector columns.
5. Set wavelength_bins to np.arange(2000,8100,100). This tells HUSTLE-tools to bin data from 2000 to 8000 AA in bins of width 100 AA or 10 nm, which will produce 60 light curves in all.

 As you modify each variable, take a moment to read the comment to its right. These comments tell you what each variable does and what else you can do with them.

In [6]:
hustle_stage_3_file = f"""
# HUSTLE-tools config file for launching Stage 3: Binning

# Setup for Stage 3
toplevel_dir    './files'                                   # Directory where your current project files are stored. This folder should contain the specimages/, directimages/, etc. folders with your data as well as the outputs folder.
input_run       'run_1'                                     # Str. This is the name of the Stage 1 run you want to load.
output_run      'run_1'                                     # Str. This is the name to save the current run to. It can be anything that does not contain spaces or special characters (e.g. $, %, @, etc.).
verbose         2                                           # Int from 0 to 2. 0 = print nothing. 1 = print some statements. 2 = print every action.
show_plots      2                                           # Int from 0 to 2. 0 = show nothing. 1 = show some plots. 2 = show all plots.
save_plots      2                                           # Int from 0 to 2. 0 = save nothing. 1 = save some plots. 2 = save all plots.

# Step 1: Read in the data
orders          ('+1','-1')                                 # List of string. The orders you want to load and operate on.

# Step 2: Light curve extraction
bin_method      'columns'                                   # Str. How to bin the light curves. Options are 'columns' (bin N columns at a time) or 'wavelengths' (bin from wavelength1 to wavelength2).
wavelength_bins np.arange(2000,4010,10)                     # Lst of floats or numpy array. If bin_method is 'wavelengths', defines edges of each wavelength bin.
N_columns       50                                          # Int. If bin_method is 'columns', how many columns go into each bin.
normalize       True                                        # Bool. If True, normalizes curves by out-of-transit/eclipse flux.
reject_bad_cols False                                       # bool. If True, masks contributions from columns deemed too noisy.
bad_col_thres   0.001                                       # float. Used to control how aggressively we flag columns. The lower the number, the less noisiness we tolerate in our columns.

# Step 3: Light curve post-processing
sigma_clip      None                                        # Float or None. If float, the sigma at which to mask outliers in sigma clipping.

# ENDPARSE
"""

# Now we write the contents of the config file out to a .hustle file.
with open(os.path.join(config_directory,'stage_3_input_config.hustle'), 'w') as f:
    f.write(hustle_stage_3_file)

In [4]:
# Solution:
hustle_stage_3_file = f"""
# HUSTLE-tools config file for launching Stage 3: Binning

# Setup for Stage 3
toplevel_dir    'tutorial'                                  # Directory where your current project files are stored. This folder should contain the specimages/, directimages/, etc. folders with your data as well as the outputs folder.
input_run       'extraction_1'                              # Str. This is the name of the Stage 1 run you want to load.
output_run      'bin_1'                                     # Str. This is the name to save the current run to. It can be anything that does not contain spaces or special characters (e.g. $, %, @, etc.).
verbose         2                                           # Int from 0 to 2. 0 = print nothing. 1 = print some statements. 2 = print every action.
show_plots      0                                           # Int from 0 to 2. 0 = show nothing. 1 = show some plots. 2 = show all plots.
save_plots      1                                           # Int from 0 to 2. 0 = save nothing. 1 = save some plots. 2 = save all plots.

# Step 1: Read in the data
orders          ('+1','-1')                                 # List of string. The orders you want to load and operate on.

# Step 2: Light curve extraction
bin_method      'wavelengths'                               # Str. How to bin the light curves. Options are 'columns' (bin N columns at a time) or 'wavelengths' (bin from wavelength1 to wavelength2).
wavelength_bins np.arange(2000,8020,20)                    # Lst of floats or numpy array. If bin_method is 'wavelengths', defines edges of each wavelength bin.
N_columns       50                                          # Int. If bin_method is 'columns', how many columns go into each bin.
normalize       True                                        # Bool. If True, normalizes curves by out-of-transit/eclipse flux.
reject_bad_cols False                                       # bool. If True, masks contributions from columns deemed too noisy.
bad_col_thres   0.001                                       # float. Used to control how aggressively we flag columns. The lower the number, the less noisiness we tolerate in our columns.

# Step 3: Light curve post-processing
sigma_clip      None                                        # Float or None. If float, the sigma at which to mask outliers in sigma clipping.

# ENDPARSE
"""

# Now we write the contents of the config file out to a .hustle file.
with open(os.path.join(config_directory,'stage_3_input_config_solution.hustle'), 'w') as f:
    f.write(hustle_stage_3_file)

Now that our config file is ready, simply use the cell below to execute Stage 3 of the pipeline!

This Stage requires no user interaction unless show_plots is set to greater than 0, which prompts HUSTLE-tools to interrupt execution to show the user the plots being generated in an interactive format. If you have set show_plots to 0, then the pipeline will automatically finish running in about 1 minute.

In [3]:
run_pipeline(config_files_dir=config_directory,
             stages=(3,))

  state_vects = (vects - median)/std

  state_vects = (vects - median)/std



Writing config file for Stage 3...
Config file written.


You made it! I hope there were no problems with execution. Now let's check out the outputs!

Inside the 'tutorial/outputs/' directory should now be 'stage3/bin_1/'. Inside that folder you will find:
- light_curves_+1.nc and light_curves_-1.nc, which are xarray files containing all of your binned light curves
- stage_3_bin_1.hustle, a copy of the config file you used to run this extraction
- plots/, a folder full of diagnostic plots that let you know how extraction operated

The rawslc_order+1 and -1 .gif files will show you each spectroscopic light curve in sequence, while the waterfall .png files show you all the spectroscopic light curves at once. Both plots are a useful way to see how your extracted light curves turned out. As before, Hubble Space Telescope systematics will be present, with the UV curves at 200-400 nm showing the most severe systematics. Despite that, they are still treatable, and we will treat them in the next Stage.

Next, let's see what happens when you bin by column instead of wavelength, which can help boost SNR in places where spectral dispersion is low. The wavelength solutions to the +1 and -1 orders are different, meaning you won't be able to combine your extracted spectra at the end of the pipeline. This may create extra challenges for modelling. Copy your config cell into the cell below and make the following change:

1. Set output_run to 'bin_2' so we can compare our different binning techniques.
2. Set bin_method to 'columns' so we can use the columnal binning technique.
3. Set columns to 10. G280 traces span about 500-550 columns, so we can expect to pull about 50-55 light curves from this process.

In [5]:
hustle_stage_3_file = f"""
# HUSTLE-tools config file for launching Stage 3: Binning

# Setup for Stage 3
toplevel_dir    './files'                                   # Directory where your current project files are stored. This folder should contain the specimages/, directimages/, etc. folders with your data as well as the outputs folder.
input_run       'run_1'                                     # Str. This is the name of the Stage 1 run you want to load.
output_run      'run_1'                                     # Str. This is the name to save the current run to. It can be anything that does not contain spaces or special characters (e.g. $, %, @, etc.).
verbose         2                                           # Int from 0 to 2. 0 = print nothing. 1 = print some statements. 2 = print every action.
show_plots      2                                           # Int from 0 to 2. 0 = show nothing. 1 = show some plots. 2 = show all plots.
save_plots      2                                           # Int from 0 to 2. 0 = save nothing. 1 = save some plots. 2 = save all plots.

# Step 1: Read in the data
orders          ('+1','-1')                                 # List of string. The orders you want to load and operate on.

# Step 2: Light curve extraction
bin_method      'columns'                                   # Str. How to bin the light curves. Options are 'columns' (bin N columns at a time) or 'wavelengths' (bin from wavelength1 to wavelength2).
wavelength_bins np.arange(2000,4010,10)                     # Lst of floats or numpy array. If bin_method is 'wavelengths', defines edges of each wavelength bin.
N_columns       50                                          # Int. If bin_method is 'columns', how many columns go into each bin.
normalize       True                                        # Bool. If True, normalizes curves by out-of-transit/eclipse flux.
reject_bad_cols False                                       # bool. If True, masks contributions from columns deemed too noisy.
bad_col_thres   0.001                                       # float. Used to control how aggressively we flag columns. The lower the number, the less noisiness we tolerate in our columns.

# Step 3: Light curve post-processing
sigma_clip      None                                        # Float or None. If float, the sigma at which to mask outliers in sigma clipping.

# ENDPARSE
"""

# Now we write the contents of the config file out to a .hustle file.
with open(os.path.join(config_directory,'stage_3_input_config.hustle'), 'w') as f:
    f.write(hustle_stage_3_file)

In [4]:
# Solution:
hustle_stage_3_file = f"""
# HUSTLE-tools config file for launching Stage 3: Binning

# Setup for Stage 3
toplevel_dir    'tutorial'                                  # Directory where your current project files are stored. This folder should contain the specimages/, directimages/, etc. folders with your data as well as the outputs folder.
input_run       'extraction_1'                              # Str. This is the name of the Stage 1 run you want to load.
output_run      'bin_2'                                     # Str. This is the name to save the current run to. It can be anything that does not contain spaces or special characters (e.g. $, %, @, etc.).
verbose         2                                           # Int from 0 to 2. 0 = print nothing. 1 = print some statements. 2 = print every action.
show_plots      0                                           # Int from 0 to 2. 0 = show nothing. 1 = show some plots. 2 = show all plots.
save_plots      1                                           # Int from 0 to 2. 0 = save nothing. 1 = save some plots. 2 = save all plots.

# Step 1: Read in the data
orders          ('+1','-1')                                 # List of string. The orders you want to load and operate on.

# Step 2: Light curve extraction
bin_method      'columns'                                   # Str. How to bin the light curves. Options are 'columns' (bin N columns at a time) or 'wavelengths' (bin from wavelength1 to wavelength2).
wavelength_bins np.arange(2000,8100,100)                    # Lst of floats or numpy array. If bin_method is 'wavelengths', defines edges of each wavelength bin.
N_columns       10                                          # Int. If bin_method is 'columns', how many columns go into each bin.
normalize       True                                        # Bool. If True, normalizes curves by out-of-transit/eclipse flux.
reject_bad_cols False                                       # bool. If True, masks contributions from columns deemed too noisy.
bad_col_thres   0.001                                       # float. Used to control how aggressively we flag columns. The lower the number, the less noisiness we tolerate in our columns.

# Step 3: Light curve post-processing
sigma_clip      None                                        # Float or None. If float, the sigma at which to mask outliers in sigma clipping.

# ENDPARSE
"""

# Now we write the contents of the config file out to a .hustle file.
with open(os.path.join(config_directory,'stage_3_input_config_solution.hustle'), 'w') as f:
    f.write(hustle_stage_3_file)

In [5]:
run_pipeline(config_files_dir=config_directory,
             stages=(3,))

  state_vects = (vects - median)/std



0 outliers clipped from broadband light curve.
0 outliers clipped from light curve of wavelength 2070.11 AA.
0 outliers clipped from light curve of wavelength 2191.72 AA.
0 outliers clipped from light curve of wavelength 2317.21 AA.
0 outliers clipped from light curve of wavelength 2446.07 AA.
0 outliers clipped from light curve of wavelength 2577.83 AA.
0 outliers clipped from light curve of wavelength 2712.09 AA.
0 outliers clipped from light curve of wavelength 2848.46 AA.
0 outliers clipped from light curve of wavelength 2986.61 AA.
0 outliers clipped from light curve of wavelength 3126.24 AA.
0 outliers clipped from light curve of wavelength 3267.11 AA.
0 outliers clipped from light curve of wavelength 3408.98 AA.
0 outliers clipped from light curve of wavelength 3551.65 AA.
0 outliers clipped from light curve of wavelength 3694.97 AA.
0 outliers clipped from light curve of wavelength 3838.80 AA.
0 outliers clipped from light curve of wavelength 3983.03 AA.
0 outliers clipped from

  state_vects = (vects - median)/std



0 outliers clipped from broadband light curve.
0 outliers clipped from light curve of wavelength 2087.66 AA.
0 outliers clipped from light curve of wavelength 2259.32 AA.
0 outliers clipped from light curve of wavelength 2427.80 AA.
0 outliers clipped from light curve of wavelength 2593.48 AA.
0 outliers clipped from light curve of wavelength 2756.72 AA.
0 outliers clipped from light curve of wavelength 2917.84 AA.
0 outliers clipped from light curve of wavelength 3077.13 AA.
0 outliers clipped from light curve of wavelength 3234.86 AA.
0 outliers clipped from light curve of wavelength 3391.27 AA.
0 outliers clipped from light curve of wavelength 3546.56 AA.
0 outliers clipped from light curve of wavelength 3700.94 AA.
0 outliers clipped from light curve of wavelength 3854.57 AA.
0 outliers clipped from light curve of wavelength 4007.60 AA.
0 outliers clipped from light curve of wavelength 4160.15 AA.
0 outliers clipped from light curve of wavelength 4312.33 AA.
0 outliers clipped from



Writing config file for Stage 3...
Config file written.


As expected, the wavelength bins in the +1 and -1 orders are similar but slightly different.

That's all for Stage 3! You can execute this Stage for any G280 time series observation you have reduced.