# GLS PERIODOGRAMS - S2 SAMPLE BATCH FILE PROCESSING

In [1]:
# NOTA IMPORTANTE - CÓMO IMPLEMENTAR LO DEL "TIMEOUT".
# https://stackoverflow.com/questions/25027122/break-the-function-after-certain-time
# NO SIRVE - SOLO SIRVE PARA ENTORNOS UNIX

This notebook takes the S2 sample RV curves and calculates the Generalised Lomb-Scargle periodogram for each of the RV curves, storing the results in plain text or `FITS` files with a primary header and two additional header data units, one with the RV curve itself and another with the periodogram.

Additionally, the S2 sample objects table and file is updated with new columns storing the results for each object.

Error control is implemented to prevent exceptions and timeouts to interrupt the processing. In case any of these errors happen, it is so recorded in the results table. Timeout value is dinamically updated, as the calculation for each record is completed.

**NOTE:** there seems to be some kind of memory leak or something similar, because the browser crashes if you try to execute all loop iterations at once. Hence, it is necessary to execute the loop in batches of 100 elements, for example.

**Calculation conditions:**

The stars currently under analysis are in the $He_{3}$ instability band, so they are expected to have effective temperatures in the range $T_{eff}\in[3300,\;4300]\;K$, $\log g\in[4.5,\;5.1]$, and masses in the range $M_{star}\in[0.20,\;0.60]\;M_{\odot}$.

According to _Table 2_ in _The theoretical instability strip of M dwarf stars (Rodríguez-López, C., et al. 2014, MNRAS, 438, 2371)_ these stars have typical periods of $20\;min$ to $3\;h$ (corresponding to periods of $P_{pulsation}\in[0.013889,\;0.125000]\;d$, or frequencies of $f_{pulsation}\in[72.0,\;8.0]\;d^{-1}$). Setting a margin over these values, we will set the limits of the periodogram frequencies for periods between $5\;min$ and $10\;h$. In days, this corresponds to a range of $P\in[0.003472,\;0.416667]\;d$ or, equivalently, frequencies in the range $f\in[288.0,\;2.4]\;d^{-1}$.

We will use the _Generalized Lomb Scargle Periodogram_ method, as described in [Zechmeister and Kürster, 2009](https://www.aanda.org/articles/aa/full_html/2009/11/aa11296-08/aa11296-08.html) and implemented by GitHub repository [mzechmeister/GLS](https://github.com/mzechmeister/GLS), using the default _ZK_ normalization.

## Modules and configuration

### Modules

In [2]:
# Modules import:
#from collections import OrderedDict
import pandas as pd
import numpy as np
import time

from IPython.display import clear_output
import warnings

from scipy import stats

# https://github.com/mzechmeister/GLS
from gls import Gls

from astropy.table import Table, QTable
#from astropy.timeseries import TimeSeries
from astropy import units as u
from astropy.io import fits
#from astropy.time import Time

import lightkurve as lk

#%matplotlib inline
import matplotlib.pyplot as plt

from pylab import rcParams
rcParams['figure.figsize'] = 11, 11

#import seaborn as sns
#sns.set_style("white", {'figure.figsize':(15,10)})
#sns.set_style("whitegrid")
#sns.set(rc={'figure.figsize':(15,8)})

### Configuration

In [3]:
# Configuration:
# Files and folders (WARNING: THIS FOLDER STRUCTURE MUST EXIST PREVIOUSLY):
GTO_FILE = "../data/RV_ML_subsample_SyntheticDatasets_with_PG.csv" # NOTE: initially this should be a copy of the previous file.
IN_RV_FOLDER = "../data/RV_DATASETS/S2_ts_files/"
OUT_IMG_FOLDER = "../data/S2_RVs_PGs/figures/"
OUT_PROCESSED_FOLDER = "../data/S2_RVs_PGs/"
#OUT_GTO_FILE = "../data/GTO_objects_withRVPG.csv"
#OUT_PG_FILE = "../data/GTO_PGs.csv"
#OUT_ERROR_FILE = "../data/GTO_PG_ERRORs.csv"

# Option to generate / save a full fits file:
FITS_FILE = False

# Periodogram constants:
FBEG = 2.4 # d^{-1}, corresponds to a period P=10 hours
FEND = 288 # d^{-1}, corresponds to a period P=5 min
NUM_POINTS = 7201 # 7200 for S1 and S2 time series (calculated from Tobs=4.2d, f_max=288 d^{-1}, n0 = 5)
PBEG = None # Default
PEND = None # Default
OFAC = 10 # Default
HIFAC = 1 # Default
#FREQ = np.linspace(start=FBEG, stop=FEND, num=1000001) # 1,000,001 points between 4.8 and 144 d^(-1)
FREQ = np.linspace(start=FBEG, stop=FEND, num=NUM_POINTS)
# Must be compatible with FBEG, FEND values
NORM = "ZK" # Default
LS = False # Default
FAST = False # Default
VERBOSE = False # Default
FAP_LEVELS_PLOT = [0.01, 0.05, 0.10] # FAP reference levels to plot

# PULSATION RANGE OF INTEREST (FOR THE PLOTS):
F_LOW = 8 # P=0.1250 d (3 h)
F_HIGH = 72 # P=0.0139 d (20 min)

# Initial timeout value (to prevent a stuck file to interrupt the whole process)
#INITIAL_TIMEOUT = 600 # Seconds


In [4]:
def rv_load(filename: str):
    '''Load the RV file and returns a lightcurve object'''
    rv_lk = Table.read(filename, format='ascii',
                    names = ['time', 'RV', 'eRV'], units=[u.d, u.meter / u.second, u.meter / u.second])
    rv_lk = lk.LightCurve(time=rv_lk['time'], flux=rv_lk['RV'], flux_err=rv_lk['eRV'])
    return rv_lk

In [5]:
def rv_infer_sampling(rv_lk: lk.LightCurve):
    '''Infer sampling period from light curve'''
    time_diffs = rv_lk['time'][1:] - rv_lk['time'][:-1]
    return np.median(time_diffs)

## Data processing

### S1 sample data loading

In [6]:
# Load S1 data table:
gto = pd.read_csv(GTO_FILE, sep=',', decimal='.')
gto.head(5)

Unnamed: 0,ID,Pulsating,frequency,amplitudeRV,offsetRV,refepochRV,phase,S1_Ps,S1_Tobs,dS1_distort_error_stdev,...,e_T0_PG_RV_S1,offset_PG_RV_S1,e_offset_PG_RV_S1,FAP_PG_RV_S1,valid_PG_RV_S1,error_PG_RV_S1,elapsed_time_PG_RV_S1,fits_file_RV_S1,PG_file_RV_S1,fig_file_RV_S1
0,Star-00000,True,52.82092,0.903571,0.0,2457400.0,0.369044,0.0016,4.2,0.015219,...,2e-06,-2.1e-05,0.000436,0.0,1.0,,3.177537,,../data/S1_RVs_PGs/Star-00000_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00000_RV_S1_PG...
1,Star-00001,True,26.23453,0.303306,0.0,2457481.0,0.653046,0.0016,4.2,0.015219,...,5e-06,1e-06,0.000164,0.0,1.0,,3.065831,,../data/S1_RVs_PGs/Star-00001_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00001_RV_S1_PG...
2,Star-00002,True,8.802977,0.484108,0.0,2457405.0,0.88792,0.0016,4.2,0.015219,...,4.5e-05,0.000631,0.000844,0.0,1.0,,3.049875,,../data/S1_RVs_PGs/Star-00002_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00002_RV_S1_PG...
3,Star-00003,True,60.951364,0.35746,0.0,2457412.0,0.213988,0.0016,4.2,0.015219,...,1e-06,5e-06,0.000126,0.0,1.0,,3.218417,,../data/S1_RVs_PGs/Star-00003_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00003_RV_S1_PG...
4,Star-00004,True,69.264691,0.380161,0.0,2457455.0,0.544654,0.0016,4.2,0.015219,...,5e-06,7.3e-05,0.000531,0.0,1.0,,3.200441,,../data/S1_RVs_PGs/Star-00004_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00004_RV_S1_PG...


In [7]:
gto[['S1_file', 'S2_file']]

Unnamed: 0,S1_file,S2_file
0,../data/RV_DATASETS/S1_ts_files/S1-RV_Star-000...,../data/RV_DATASETS/S2_ts_files/S2-RV_Star-000...
1,../data/RV_DATASETS/S1_ts_files/S1-RV_Star-000...,../data/RV_DATASETS/S2_ts_files/S2-RV_Star-000...
2,../data/RV_DATASETS/S1_ts_files/S1-RV_Star-000...,../data/RV_DATASETS/S2_ts_files/S2-RV_Star-000...
3,../data/RV_DATASETS/S1_ts_files/S1-RV_Star-000...,../data/RV_DATASETS/S2_ts_files/S2-RV_Star-000...
4,../data/RV_DATASETS/S1_ts_files/S1-RV_Star-000...,../data/RV_DATASETS/S2_ts_files/S2-RV_Star-000...
...,...,...
995,../data/RV_DATASETS/S1_ts_files/S1-RV_Star-009...,../data/RV_DATASETS/S2_ts_files/S2-RV_Star-009...
996,../data/RV_DATASETS/S1_ts_files/S1-RV_Star-009...,../data/RV_DATASETS/S2_ts_files/S2-RV_Star-009...
997,../data/RV_DATASETS/S1_ts_files/S1-RV_Star-009...,../data/RV_DATASETS/S2_ts_files/S2-RV_Star-009...
998,../data/RV_DATASETS/S1_ts_files/S1-RV_Star-009...,../data/RV_DATASETS/S2_ts_files/S2-RV_Star-009...


In [8]:
gto.tail(5)

Unnamed: 0,ID,Pulsating,frequency,amplitudeRV,offsetRV,refepochRV,phase,S1_Ps,S1_Tobs,dS1_distort_error_stdev,...,e_T0_PG_RV_S1,offset_PG_RV_S1,e_offset_PG_RV_S1,FAP_PG_RV_S1,valid_PG_RV_S1,error_PG_RV_S1,elapsed_time_PG_RV_S1,fits_file_RV_S1,PG_file_RV_S1,fig_file_RV_S1
995,Star-00995,True,45.328345,0.669382,0.0,2457792.0,0.912955,0.0016,4.2,0.015219,...,,,,,,,,,,
996,Star-00996,True,54.607023,1.347077,0.0,2457455.0,0.394111,0.0016,4.2,0.015219,...,,,,,,,,,,
997,Star-00997,True,55.012557,1.08991,0.0,2457430.0,0.784495,0.0016,4.2,0.015219,...,,,,,,,,,,
998,Star-00998,True,31.05558,0.820674,0.0,2457562.0,0.550324,0.0016,4.2,0.015219,...,,,,,,,,,,
999,Star-00999,True,12.181552,0.389735,0.0,2457451.0,0.216874,0.0016,4.2,0.015219,...,,,,,,,,,,


In [9]:
gto.shape

(1000, 69)

In [10]:
print(list(gto.columns))

['ID', 'Pulsating', 'frequency', 'amplitudeRV', 'offsetRV', 'refepochRV', 'phase', 'S1_Ps', 'S1_Tobs', 'dS1_distort_error_stdev', 'S2_errorRV_dist_idx', 'S2_errorRV_dist_name', 'S2_errorRV_dist_loc', 'S2_errorRV_dist_scale', 'S2_errorRV_mean', 'S2_errorRV_median', 'S2_errorRV_stdev', 'dS2_distort_error_stdev', 'S3_sampling_idx', 'S3_Tobs', 'S3_Ps_mean', 'S3_Ps_median', 'S3_Ps_stdev', 'S3_NumPoints', 'dS3_distort_error_stdev', 'S4_errorRV_mean', 'S4_errorRV_median', 'S4_errorRV_stdev', 'dS4_distort_error_stdev', 'S1_file', 'S2_file', 'S3_file', 'S4_file', 'dS1_file', 'dS2_file', 'dS3_file', 'dS4_file', 'n_RV_S1', 'Ps_RV_S1', 'fs_RV_S1', 'wmean_RV_S1', 'wrms_RV_S1', 'info_PG_RV_S1', 'maxP_PG_RV_S1', 'maxSNR_PG_RV_S1', 'rms_PG_RV_S1', 'f_PG_RV_S1', 'e_f_PG_RV_S1', 'Pd_PG_RV_S1', 'e_Pd_PG_RV_S1', 'Ph_PG_RV_S1', 'e_Ph_PG_RV_S1', 'Pm_PG_RV_S1', 'e_Pm_PG_RV_S1', 'A_PG_RV_S1', 'e_A_PG_RV_S1', 'ph_PG_RV_S1', 'e_ph_PG_RV_S1', 'T0_PG_RV_S1', 'e_T0_PG_RV_S1', 'offset_PG_RV_S1', 'e_offset_PG_RV_S1'

Generate the proper auxiliary columns (with the basic periodograms results).

In [11]:
# Additional columns:
if 'n_RV_S2' in gto.columns:
    pass
else:
    gto['n_RV_S2'] = None # Number of points in RV curve
    gto['Ps_RV_S2'] = None # Sampling period (d)
    gto['fs_RV_S2'] = None # Sampling frequency (d^(-1))
    gto['wmean_RV_S2'] = None # Mean of RV
    gto['wrms_RV_S2'] = None # RMS of RV
    gto['info_PG_RV_S2'] = None # Information text about the PG
    gto['maxP_PG_RV_S2'] = None # Max power value in the PG
    gto['maxSNR_PG_RV_S2'] = None # Max power value in the PG
    gto['rms_PG_RV_S2'] = None # RMS value in the PG residuals
    gto['f_PG_RV_S2'] = None # Best frequency in the PG (d^(-1))
    gto['e_f_PG_RV_S2'] = None # Error of the best frequency in the PG (d^(-1))
    gto['Pd_PG_RV_S2'] = None # Best period in the PG (d)
    gto['e_Pd_PG_RV_S2'] = None # Error of the best period in the PG (d)
    gto['Ph_PG_RV_S2'] = None # Best period in the PG (hours)
    gto['e_Ph_PG_RV_S2'] = None # Error of the best period in the PG (hours)
    gto['Pm_PG_RV_S2'] = None # Best period in the PG (minutes)
    gto['e_Pm_PG_RV_S2'] = None # Error of the best period in the PG (minutes)
    gto['A_PG_RV_S2'] = None # Amplitude of the best frequency
    gto['e_A_PG_RV_S2'] = None # Error of the amplitude of the best frequency
    gto['ph_PG_RV_S2'] = None # Amplitude of the best frequency
    gto['e_ph_PG_RV_S2'] = None # Error of the amplitude of the best frequency
    gto['T0_PG_RV_S2'] = None # Reference epoch of the best frequency
    gto['e_T0_PG_RV_S2'] = None # Error of the epoch of the best frequency
    gto['offset_PG_RV_S2'] = None # Offset of the best frequency
    gto['e_offset_PG_RV_S2'] = None # Error of the offset of the best frequency
    gto['FAP_PG_RV_S2'] = None # False alarm probability
    gto['valid_PG_RV_S2'] = None # Flag to indicate if the periodogram calculation succeeded (1) or not (0).
    gto['error_PG_RV_S2'] = None # The error raised during processing. Empty if processing was successful.
    gto['elapsed_time_PG_RV_S2'] = None # The time elapsed in calculation
    gto['fits_file_RV_S2'] = None # The name of the processed fits file.
    gto['PG_file_RV_S2'] = None # The name of the periodogram results file (plain text).
    gto['fig_file_RV_S2'] = None # The name of the figure file.


In [12]:
print(list(gto.columns))

['ID', 'Pulsating', 'frequency', 'amplitudeRV', 'offsetRV', 'refepochRV', 'phase', 'S1_Ps', 'S1_Tobs', 'dS1_distort_error_stdev', 'S2_errorRV_dist_idx', 'S2_errorRV_dist_name', 'S2_errorRV_dist_loc', 'S2_errorRV_dist_scale', 'S2_errorRV_mean', 'S2_errorRV_median', 'S2_errorRV_stdev', 'dS2_distort_error_stdev', 'S3_sampling_idx', 'S3_Tobs', 'S3_Ps_mean', 'S3_Ps_median', 'S3_Ps_stdev', 'S3_NumPoints', 'dS3_distort_error_stdev', 'S4_errorRV_mean', 'S4_errorRV_median', 'S4_errorRV_stdev', 'dS4_distort_error_stdev', 'S1_file', 'S2_file', 'S3_file', 'S4_file', 'dS1_file', 'dS2_file', 'dS3_file', 'dS4_file', 'n_RV_S1', 'Ps_RV_S1', 'fs_RV_S1', 'wmean_RV_S1', 'wrms_RV_S1', 'info_PG_RV_S1', 'maxP_PG_RV_S1', 'maxSNR_PG_RV_S1', 'rms_PG_RV_S1', 'f_PG_RV_S1', 'e_f_PG_RV_S1', 'Pd_PG_RV_S1', 'e_Pd_PG_RV_S1', 'Ph_PG_RV_S1', 'e_Ph_PG_RV_S1', 'Pm_PG_RV_S1', 'e_Pm_PG_RV_S1', 'A_PG_RV_S1', 'e_A_PG_RV_S1', 'ph_PG_RV_S1', 'e_ph_PG_RV_S1', 'T0_PG_RV_S1', 'e_T0_PG_RV_S1', 'offset_PG_RV_S1', 'e_offset_PG_RV_S1'

### Batch processing of all RV files

In [13]:
n = len(gto)
n

1000

#### Batch processing

In [14]:
warnings.filterwarnings('ignore')
# Batch processing:
lapse_list = []
median_lapse = None
# NOTE THE FOR LOOP IS SEPARATED INTO SEVERAL "BATCHES", SO AS TO PREVENT POTENTIAL MEMORY PROBLEMS
# OR TOO MUCH TIME ELAPSED
#for i in range(0, 3): # TEST
#for i in range(0, len(gto)): # ALL RECORDS
for i in range(0, 300):
    clear_output(wait=True)
    start_time = time.time()
    # Names:
    karmn = gto.loc[i, 'ID'] # Synthetic star name
    #commn = gto.loc[i, 'Name'] # Common name
    #tic_id = str(gto.loc[i, 'TIC_id']) # TESS TIC identifier
    print("Record: %d, started at %s"
          %(i, time.strftime('%d/%m/%Y, %H:%M:%S', time.localtime(start_time))))
    if median_lapse is None:
        print("Previous median lapse time: %s" %median_lapse)
    else:
        print("Previous median lapse time: %.2f seconds" %median_lapse)
    print("Processing %s star..." %karmn)
    if True: # TEST
    #try:
        # LOAD RV FILE:
        rv_file = gto.loc[i, 'S2_file']
        # NOTE: a modification was needed for synthetic file locations:
        #rv_file = rv_file.replace("./", "../data/")
        print("filename: %s" %rv_file)
        rv_lk = rv_load(rv_file)
        
        # GENERATE PERIODOGRAM:
        gls = Gls((rv_lk['time'].value, rv_lk['flux'].value, rv_lk['flux_err'].value),
              fbeg=FBEG, fend=FEND, Pbeg=PBEG, Pend=PEND, ofac=OFAC, hifac=HIFAC, freq=FREQ,
              norm=NORM, ls=LS, fast=FAST, verbose=VERBOSE)

        # SAVE THE PERIODOGRAM (A PLAIN TEXT FILE):
        pg_filename = OUT_PROCESSED_FOLDER + karmn + "_RV_S2_PG.dat"
        gls.toFile(ofile=pg_filename, header=False)

        # BASIC CALCULATIONS (NEEDED FOR THE TABLE, EVEN IF THE FITS FILE IS NOT SAVED)
        psample = rv_infer_sampling(rv_lk).value
        fsample = 1.0 / psample
        fnyq = 2.0 * fsample
        
        if FITS_FILE == True:
            # GENERATE THE FITS FILE (IF SO INDICATED IN THE OPTIONS):
            # Prepare the fits primary HDU (only header):
            primary_header = fits.Header()
            primary_header['OBJECT'] = (karmn, "KARMENES target name")
            primary_header['NAME'] = (commn, "Object common name")
            primary_header['TIC'] = (tic_id, "Object TESS identifier")
            primary_header['RA_J2000'] = ("00:05:10.89", "Object right ascension (J2000)")
            primary_header['DE_J2000'] = ("+45:47:11.6", "Object declination (J2000)")
            primary_header['SPTYPE'] = ("M1.0 V", "Spectral type")
            primary_header['TEFF_K'] = (3773, "Effective temperature in Kelvin")
            primary_header['LOGG'] = (5.07, "Logarithm of surface gravity")
            primary_header['FEH'] = (-0.04, "Metallicity")
            primary_header['L_LSUN'] = (0.0436229, "Luminosity in Solar luminosities")
            primary_header['R_RSUN'] = (0.48881, "Radius in Solar radii")
            primary_header['M_MSUN'] = (0.4918, "Mass in Solar masses")
            primary_header['D_PC'] = (11.50352803, "Distance in parsec")
            hdu_primary = fits.PrimaryHDU(header=primary_header)

            # Prepare the RV HDU:
            hdu_rv = fits.table_to_hdu(QTable(rv_lk.to_table()))
            hdu_rv.name = "RV_CURVE"
            freq_units = u.d ** (-1)
            hdu_rv.header['OBJECT'] = (karmn, "KARMENES target name")
            hdu_rv.header['PUNIT'] = u.d.to_string(format='fits')
            hdu_rv.header['FUNIT'] = freq_units.to_string(format='fits')
            hdu_rv.header['RVPOINTS'] = (gls.N, "Number of points in the RV curve")
            hdu_rv.header['AVGFLUX'] = (gls._Y, "Average flux of RV curve")
            hdu_rv.header['RMSFLUX'] = (np.sqrt(gls._YY), "Flux RMS of RV curve")
            hdu_rv.header['PSAMPLE'] = (psample, "Inferred cadence in RV curve")
            hdu_rv.header['FSAMPLE'] = (fsample, "Inferred sampling frequency in RV curve")
            hdu_rv.header['FNYQUIST'] = (fnyq, "Calculated Nyquist frequency value")
                
            # Prepare the PG HDU:
            hdu_pg = fits.table_to_hdu(
                QTable(data=[gls.freq, gls.power], names=['freq', 'power'], 
                       units=[1.0 / u.d, (u.m / u.s) ** 2]))
            hdu_pg.name = "GLS_PG"
            fpoints = len(gls.f)
            fres = (gls.fend - gls.fbeg) / (fpoints - 1)
            hdu_pg.header['OBJECT'] = (karmn, "KARMENES target name")
            hdu_pg.header['FUNIT'] = (freq_units.to_string(format='fits'), "Unit for frequencies")
            hdu_pg.header['PUNIT'] = (u.d.to_string(format='fits'), "Unit for periods")
            hdu_pg.header['PK_FREQ'] = (gls.best['f'], "Frequency of the peak in periodogram")
            hdu_pg.header['PK_POW'] = (gls.pmax, "Power of the peak in periodogram")
            hdu_pg.header['PK_SNR'] = (gls.best['amp'] / gls.rms, "SNR of the peak in periodogram")
            hdu_pg.header['PK_FAP'] = (gls.FAP(Pn=None), "FAP of the peak in periodogram")
            hdu_pg.header['RES_RMS'] = (gls.rms, "RMS of residuals in periodogram")
            hdu_pg.header['FSAMPLE'] = (fsample, "Inferred sampling frequency in RV curve")
            hdu_pg.header['FNYQUIST'] = (fnyq, "Calculated Nyquist frequency value")
            hdu_pg.header['FPOINTS'] = (fpoints, "Number of points in periodogram")
            hdu_pg.header['FBEG'] = (gls.fbeg, "Start frequency in periodogram")
            hdu_pg.header['FEND'] = (gls.fend, "End frequency in periodogram")
            hdu_pg.header['FRES'] = (fres, "Frequency resolution in periodogram")
            hdu_pg.header['F'] = (gls.best['f'], "Peak best estimate: frequency")
            hdu_pg.header['E_F'] = (gls.best['e_f'], "Peak best estimate: frequency error")
            hdu_pg.header['P'] = (gls.best['P'], "Peak best estimate: period")
            hdu_pg.header['E_P'] = (gls.best['e_P'], "Peak best estimate: period error")
            hdu_pg.header['A'] = (gls.best['amp'], "Peak best estimate: amplitude")
            hdu_pg.header['E_A'] = (gls.best['e_amp'], "Peak best estimate: amplitude error")
            hdu_pg.header['PH'] = (gls.best['ph'], "Peak best estimate: phase")
            hdu_pg.header['E_PH'] = (gls.best['e_ph'], "Peak best estimate: phase error")
            hdu_pg.header['T0'] = (gls.best['T0'], "Peak best estimate: frequency")
            hdu_pg.header['E_T0'] = (gls.best['e_T0'], "Peak best estimate: frequency error")
            hdu_pg.header['OFF'] = (gls.best['offset'], "Peak best estimate: offset")
            hdu_pg.header['E_OFF'] = (gls.best['e_offset'], "Peak best estimate: offset error")
            hdu_pg.header['OFAC'] = (gls.ofac, "Setup: oversampling factor")
            hdu_pg.header['HIFAC'] = (gls.ofac, "Setup: maximum frequency factor")
            hdu_pg.header['NORM'] = (gls.norm, "Setup: normalization type")
            hdu_pg.header['LS'] = (gls.ls, "Setup: conventional Lomb-Scargle calculation")
            hdu_pg.header['FAST'] = (gls.fast, "Setup: fast evaluation, recursive trigonometric")
        
            # Create and save the fits file (if so desired):
            fits_filename = OUT_PROCESSED_FOLDER + karmn + "_RV_S2_PG.fits"
            hdul = fits.HDUList([hdu_primary, hdu_rv, hdu_pg])
            hdul.writeto(fits_filename, overwrite=True)

            # Delete all the HDUs:
            del hdul
            del hdu_primary
            del hdu_rv
            del hdu_pg
        
        else:
            # Do not create the fits file.
            fits_filename = None

        # FILL IN THE DATA IN THE GTO TABLE:
        gto.loc[i, 'n_RV_S2'] = gls.N
        gto.loc[i, 'Ps_RV_S2'] = psample
        gto.loc[i, 'fs_RV_S2'] = fsample
        gto.loc[i, 'wmean_RV_S2'] = gls._Y
        gto.loc[i, 'wrms_RV_S2'] = np.sqrt(gls._YY)
        gto.loc[i, 'info_PG_RV_S2'] = gls.info(stdout=False)
        gto.loc[i, 'maxP_PG_RV_S2'] = gls.power.max()
        gto.loc[i, 'maxSNR_PG_RV_S2'] = gls.best['amp'] / gls.rms
        gto.loc[i, 'rms_PG_RV_S2'] = gls.rms
        gto.loc[i, 'f_PG_RV_S2'] = gls.best['f']
        gto.loc[i, 'e_f_PG_RV_S2'] = gls.best['e_f']
        gto.loc[i, 'Pd_PG_RV_S2'] = gls.best['P']
        gto.loc[i, 'e_Pd_PG_RV_S2'] = gls.best['e_P']
        gto.loc[i, 'Ph_PG_RV_S2'] = 24.0 * gls.best['P']
        gto.loc[i, 'e_Ph_PG_RV_S2'] = 24.0 * gls.best['e_P']
        gto.loc[i, 'Pm_PG_RV_S2'] = 24.0 * 60.0 * gls.best['P']
        gto.loc[i, 'e_Pm_PG_RV_S2'] = 24.0 * 60.0 * gls.best['e_P']
        gto.loc[i, 'A_PG_RV_S2'] = gls.best['amp']
        gto.loc[i, 'e_A_PG_RV_S2'] = gls.best['e_amp']
        gto.loc[i, 'ph_PG_RV_S2'] = gls.best['ph']
        gto.loc[i, 'e_ph_PG_RV_S2'] = gls.best['e_ph']
        gto.loc[i, 'T0_PG_RV_S2'] = gls.best['T0']
        gto.loc[i, 'e_T0_PG_RV_S2'] = gls.best['e_T0']
        gto.loc[i, 'offset_PG_RV_S2'] = gls.best['offset']
        gto.loc[i, 'e_offset_PG_RV_S2'] = gls.best['e_offset']
        gto.loc[i, 'FAP_PG_RV_S2'] = gls.FAP(Pn=None)
        
        # GENERATE THE FIGURE:
        if gto.loc[i, 'Pulsating'] == True:
            star_type = 'Pulsating'
        else:
            star_type = 'Non-pulsating'
        fig = gls.plot(block=False, period=False,
                       fap=FAP_LEVELS_PLOT, gls=True, data=True, residuals=True)
        # Add the reference lines for predicted pulsations:
        fig.axes[0].axvline(F_LOW, color="darkgray", linestyle="--")
        fig.axes[0].axvline(F_HIGH, color="darkgray", linestyle="--")
        figure_title = "S2: %s (%s). Properties: $\\nu$ = %.4f [$d^{-1}$], A = %.4f [$ms^{-1}$]\n" \
            "Detected: P=%.4f [min], f=%.4f [$d^{-1}$], FAP=%.4f%%" \
            %(karmn, star_type, gto.loc[i, 'frequency'], gto.loc[i, 'amplitudeRV'], \
              gto.loc[i, 'Pm_PG_RV_S2'], gto.loc[i, 'f_PG_RV_S2'], 100.0 * gto.loc[i, 'FAP_PG_RV_S2'])
        fig.suptitle(figure_title, fontdict = {'fontsize' : 36})
        fig.tight_layout()
        # Save the figure to disk:
        fig_file = OUT_IMG_FOLDER + karmn + "_RV_S2_PG.jpg"
        fig.savefig(fig_file)
        plt.close() # Prevent the figure from showing.

        # SET THE RECORD CALCULATION AS VALID AND STORE THE REULTING FILENAMES
        gto.loc[i, 'valid_PG_RV_S2'] = 1
        gto.loc[i, 'error_PG_RV_S2'] = ""
        gto.loc[i, 'fits_file_RV_S2'] = fits_filename
        gto.loc[i, 'PG_file_RV_S2'] = pg_filename
        gto.loc[i, 'fig_file_RV_S2'] = fig_file

        # UPDATE THE AVERAGE RECORD PROCESSING TIME:
        lapse = time.time() - start_time
        lapse_list.append(lapse)
        median_lapse = np.nanmedian(lapse_list)
        gto.loc[i, 'elapsed_time_PG_RV_S2'] = lapse
        
        # SAVE THE UPDATED GTO TABLE TO DISK:
        gto.to_csv(GTO_FILE, sep=',', decimal='.', index=False)
        
        # Report successful execution:
        print("Elapsed time: %.2f seconds" %lapse)
        print("... SUCCESS.")
        
        # Clear memory (delete the 'gls' object'):
        try:
            del gls
        except:
            pass
        
    else: # TEST
    #except Exception as e:
        # Some error happened, establish the record as not valid and record the error:
        error = "*** Some ERROR happened with record #%d, %s star. Error=%s" %(i, karmn, e)
        print(error)
        gto.loc[i, 'valid_PG_RV_S2'] = 0
        gto.loc[i, 'error_PG_RV_S2'] = e
        
        # Try to update the record, and save the file:
        try:
            # UPDATE THE AVERAGE RECORD PROCESSING TIME:
            lapse = time.time() - start_time
            lapse_list.append(lapse)
            median_lapse = np.nanmedian(lapse_list)
            gto.loc[i, 'elapsed_time_PG_RV_S2'] = lapse
            print("Elapsed time: %.2f seconds" %lapse)

            # SAVE THE UPDATED GTO TABLE TO DISK:
            gto.to_csv(GTO_FILE, sep=',', decimal='.', index=False)
        except Exception as e2:
            error_2 = "*** Additional ERROR happened with record #%d, %s star. Error=%s" %(i, karmn, str(e2))
            print(error)
            gto.loc[i, 'error_PG_RV_S2'] = gto.loc[i, 'error_PG_RV_S2'] + "/" + str(e2)

        # Clear memory (delete the 'gls' object'):
        try:
            del gls
        except:
            pass


Record: 299, started at 11/12/2022, 00:52:17
Previous median lapse time: 3.25 seconds
Processing Star-00299 star...
filename: ../data/RV_DATASETS/S1_ts_files/S1-RV_Star-00299.dat
Results have been written to file:  ../data/S1_RVs_PGs/Star-00299_RV_S1_PG.dat
Elapsed time: 3.46 seconds
... SUCCESS.


## Check the calculations that took longer to complete

We now check the calculations that took longer, so as to try to repeat them (as it could be due to a problem with the computer itself - for example, the program stopped when the computer went into "sleep" mode).

In [15]:
gto[['ID', 'elapsed_time_PG_RV_S1']].sort_values(by='elapsed_time_PG_RV_S1', ascending=False) \
    .head(10)

Unnamed: 0,ID,elapsed_time_PG_RV_S1
261,Star-00261,8.319168
215,Star-00215,7.31306
177,Star-00177,6.949609
145,Star-00145,5.909984
119,Star-00119,5.200099
98,Star-00098,5.011801
80,Star-00080,4.479023
66,Star-00066,4.148988
54,Star-00054,4.056217
44,Star-00044,3.869785


## Summary of calculated periodograms

Number of objects with RV PG properly calculated:

In [16]:
gto[gto['valid_PG_RV_S1'] == 1.0]

Unnamed: 0,ID,Pulsating,frequency,amplitudeRV,offsetRV,refepochRV,phase,S1_Ps,S1_Tobs,dS1_distort_error_stdev,...,e_T0_PG_RV_S1,offset_PG_RV_S1,e_offset_PG_RV_S1,FAP_PG_RV_S1,valid_PG_RV_S1,error_PG_RV_S1,elapsed_time_PG_RV_S1,fits_file_RV_S1,PG_file_RV_S1,fig_file_RV_S1
0,Star-00000,True,52.820920,0.903571,0.0,2.457400e+06,0.369044,0.0016,4.2,0.015219,...,0.000002,-0.000021,0.000436,0.0,1.0,,3.353678,,../data/S1_RVs_PGs/Star-00000_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00000_RV_S1_PG...
1,Star-00001,True,26.234530,0.303306,0.0,2.457481e+06,0.653046,0.0016,4.2,0.015219,...,0.000005,0.000001,0.000164,0.0,1.0,,3.231352,,../data/S1_RVs_PGs/Star-00001_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00001_RV_S1_PG...
2,Star-00002,True,8.802977,0.484108,0.0,2.457405e+06,0.887920,0.0016,4.2,0.015219,...,0.000045,0.000631,0.000844,0.0,1.0,,3.045902,,../data/S1_RVs_PGs/Star-00002_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00002_RV_S1_PG...
3,Star-00003,True,60.951364,0.357460,0.0,2.457412e+06,0.213988,0.0016,4.2,0.015219,...,0.000001,0.000005,0.000126,0.0,1.0,,3.105918,,../data/S1_RVs_PGs/Star-00003_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00003_RV_S1_PG...
4,Star-00004,True,69.264691,0.380161,0.0,2.457455e+06,0.544654,0.0016,4.2,0.015219,...,0.000005,0.000073,0.000531,0.0,1.0,,3.118666,,../data/S1_RVs_PGs/Star-00004_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00004_RV_S1_PG...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
295,Star-00295,True,63.782451,1.006607,0.0,2.457486e+06,0.365383,0.0016,4.2,0.015219,...,0.000007,-0.000102,0.001910,0.0,1.0,,3.445510,,../data/S1_RVs_PGs/Star-00295_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00295_RV_S1_PG...
296,Star-00296,True,53.548451,1.268942,0.0,2.457398e+06,0.313807,0.0016,4.2,0.015219,...,0.000008,-0.000041,0.002407,0.0,1.0,,3.416965,,../data/S1_RVs_PGs/Star-00296_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00296_RV_S1_PG...
297,Star-00297,True,64.223171,1.087123,0.0,2.457434e+06,0.608004,0.0016,4.2,0.015219,...,0.000006,0.000273,0.001956,0.0,1.0,,3.583422,,../data/S1_RVs_PGs/Star-00297_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00297_RV_S1_PG...
298,Star-00298,True,29.587050,0.359739,0.0,2.457429e+06,0.261209,0.0016,4.2,0.015219,...,0.000012,-0.000097,0.000580,0.0,1.0,,3.444328,,../data/S1_RVs_PGs/Star-00298_RV_S1_PG.dat,../data/S1_RVs_PGs/figures/Star-00298_RV_S1_PG...


In [17]:
gto[gto['valid_PG_RV_S1'] == 0.0][['ID', 'S1_file', 'valid_PG_RV_S1', 'error_PG_RV_S1']]

Unnamed: 0,ID,S1_file,valid_PG_RV_S1,error_PG_RV_S1


All periodograms were calculated correctly for all the objects in S1 sample.

# Summary

**OBSERVATIONS AND CONCLUSIONS:**
- We have completed the basic GLS periodogram calculation for the radial velocity (RV) curves of the synthetic sample S1, and stored the results.
- All 1000 objects were calculated correctly.