# CALCULATION OF FREQUENCY GRID FOR TESS LC PERIODOGRAM

This notebook calculates the desirable frequency grid for periodogram calculations, based on the characteristics of the TESS LC time series.

**Calculation details:**

For the establishment of the frequency grid for periodogram calculation, we need to define three values: the minimum frequency, the maximum frequency, and the frequency grid spacing, as per the recommendations on section 7.1 of _Understanding the Lomb–Scargle Periodogram (VanderPlas, J. T. 2018, APJS, 236(1), 16)_. In this work, the recommendations are the following ones:

- For the lower frequency limit, set it to 0, it would not do any difference in terms of computation load.
- For the upper frequency limit, set it to the Nyquist (uniform sampling) or pseudo-Nyquist (non-uniform sampling) or, even better, according to the specific scientific case (i.e. set it according to the periods we are searching for in the signals). 
- For the frequency grid spacing, use the inverse of the observation window, applying an additional factor between 1/5 and 1/10.

The stars currently under analysis are in the $He_{3}$ instability band, so they are expected to have effective temperatures in the range $T_{eff}\in[3300, 4300] K$, $\log g\in[4.5, 5.1]$, and masses in the range $M_{star}\in[0.20, 0.60] M_{Sun}$.

According to _Table 2_ in _The theoretical instability strip of M dwarf stars (Rodríguez-López, C., et al. 2014, MNRAS, 438, 2371)_ these stars have typical periods of $20 min$ to $4 h$, so we could set the limits of the periodogram frequencies for periods between $5 min$ and $10 h$, so as to perfectly cover the target range and to leave enough margins if the stacked periodogram is to show some "plateau". In days, this corresponds to a range of $P\in[0.003472, 0.416667] d$ or, equivalently, frequencies in the range $f\in[288.0, 2.4] d^{-1}$.

For the grid spacing, we need to cover all the observation windows present in the objects under study, as they will all be different. We will hence calculate it looking into each of the source TESS LC files. For easier calculation, we consider just the overall observation time of objects, instead of the length of multiple different observation periods that exist in each object. In any case, the value used will be a maximum possible observation time, yielding the smallest possible grip spacing (i.e. more resolution). As opposed to the case of RV time series, there are a lot more points, and probably of more continuous observation time, so we will use the 1/10 factor.


## Modules and configuration

### Modules

In [6]:
# Modules import:
import numpy as np
import pandas as pd

from astropy.time.core import TimeDelta

#from scipy import stats

import lightkurve as lk

from matplotlib import pyplot as plt
from matplotlib import lines
import seaborn as sns

sns.set_style("white", {'figure.figsize':(16,9)})


### Configuration

In [7]:
# Configuration:
# Files and folders (WARNING: THIS FOLDER STRUCTURE MUST EXIST PREVIOUSLY):
#GTO_FILE = "../data/CARM_VIS_objects_with_PG.csv" # NOTE: initially this should be a copy of the previous file.
GTO_FILE = "../data/SELECTION_for_PG_CARM_VIS_objects_with_PG.csv" # NOTE: initially this should be a copy of the previous file.

N0 = 10 # For the factor (1/N0 = 1/5, 1/10,...)


### Functions

In [8]:
def tess_lc_load(filename: str):
    '''Load the TESS LC file and returns a lightcurve object with just the three relevant columns
    Note: it seems to be necessary to drop 'nan' values for GLS to work properly.'''
    lc_lk = lk.read(filename).remove_nans()
    return lc_lk

In [9]:
def draw_scatter(data, x, y, hue=None, size=None, alpha=None,
                 href_lines=[], href_label="", href_color="darkred",
                 vref_lines=[], vref_label="", vref_color="darkblue",                 
                 title_override=None,
                 fig_filename=None):
    '''Draws a scatter plot as per the data passed, setting the title'''
    sns.set_style("white", {'figure.figsize':(16,9)})
    ax = sns.scatterplot(data=data, x=x, y=y, hue=hue, size=size, alpha=alpha)
    handles=[]
    for i in range(0, len(href_lines)):
        if i == 0:
            handles.append(lines.Line2D([], [], color=href_color, label=href_label))
        else:
            pass
        ax.axhline(y=href_lines[i], color=href_color)
    for i in range(0, len(vref_lines)):
        if i == 0:
            handles.append(lines.Line2D([], [], color=vref_color, label=vref_label))
        else:
            pass
        ax.axvline(x=vref_lines[i], color=vref_color)

    if title_override is None:
        title = "Correlation - " + x + " vs. " + y
    else:
        title = title_override
    if hue is not None:
        title = title + "\nBy " + hue + " (colour)"
    if size is not None:
        title = title + "\nBy " + size + " (size)"
    ax.set_title(title, fontsize='x-large')
    ax.set_xlabel(x, fontsize='large')
    ax.set_ylabel(y, fontsize='large')
    ax.figure.set_size_inches(16, 9)
    ax.legend(handles=handles, loc='upper right')
    
    if fig_filename == None:
        pass
    else:
        ax.figure.savefig(fig_filename, format='jpg')

## Data processing

### GTO data loading

In [10]:
# Load GTO data table:
gto = pd.read_csv(GTO_FILE, sep=',', decimal='.')
gto.head(5)

Unnamed: 0,Karmn,Name,Comp,GJ,RA_J2016_deg,DE_J2016_deg,RA_J2000,DE_J2000,l_J2016_deg,b_J2016_deg,...,WF_offset_PG_TESS,WF_e_offset_PG_TESS,WF_FAP_PG_TESS,WF_valid_PG_TESS,WF_error_PG_TESS,WF_elapsed_time_PG_TESS,WF_plain_file_TESS,WF_fig_file_TESS,PG_file_RV,PG_file_TESS
0,J23548+385,RX J2354.8+3831,-,,358.713658,38.52634,23:54:51.46,+38:31:36.2,110.941908,-23.024449,...,999.999756,2.151008e-06,1.0,1.0,,94.758838,../data/CARM_VIS_TESS_WinFunc_PGs/WF_J23548+38...,../data/CARM_VIS_TESS_WinFunc_PGs/figures/WF_J...,../data/CARM_VIS_RVs_PGs/J23548+385_RV_PG.dat,../data/CARM_VIS_TESS_PGs/J23548+385_RV_PG.dat
1,J23505-095,LP 763-012,-,4367.0,357.634705,-9.560964,23:50:31.64,-09:33:32.7,80.777067,-67.303426,...,1000.000122,9.022946e-07,1.0,1.0,,132.607176,../data/CARM_VIS_TESS_WinFunc_PGs/WF_J23505-09...,../data/CARM_VIS_TESS_WinFunc_PGs/figures/WF_J...,../data/CARM_VIS_RVs_PGs/J23505-095_RV_PG.dat,../data/CARM_VIS_TESS_PGs/J23505-095_RV_PG.dat
2,J23431+365,GJ 1289,-,1289.0,355.781509,36.53631,23:43:06.31,+36:32:13.1,107.922839,-24.336479,...,999.999512,4.306074e-06,1.0,1.0,,97.939914,../data/CARM_VIS_TESS_WinFunc_PGs/WF_J23431+36...,../data/CARM_VIS_TESS_WinFunc_PGs/figures/WF_J...,../data/CARM_VIS_RVs_PGs/J23431+365_RV_PG.dat,../data/CARM_VIS_TESS_PGs/J23431+365_RV_PG.dat
3,J23381-162,G 273-093,-,4352.0,354.532687,-16.236514,23:38:08.16,-16:14:10.2,61.845437,-69.82522,...,1000.000122,9.022946e-07,1.0,1.0,,136.603404,../data/CARM_VIS_TESS_WinFunc_PGs/WF_J23381-16...,../data/CARM_VIS_TESS_WinFunc_PGs/figures/WF_J...,../data/CARM_VIS_RVs_PGs/J23381-162_RV_PG.dat,../data/CARM_VIS_TESS_PGs/J23381-162_RV_PG.dat
4,J23245+578,BD+57 2735,-,895.0,351.126628,57.853057,23:24:30.51,+57:51:15.5,111.552287,-3.085183,...,999.999512,3.720858e-06,1.0,1.0,,131.327304,../data/CARM_VIS_TESS_WinFunc_PGs/WF_J23245+57...,../data/CARM_VIS_TESS_WinFunc_PGs/figures/WF_J...,../data/CARM_VIS_RVs_PGs/J23245+578_RV_PG.dat,../data/CARM_VIS_TESS_PGs/J23245+578_RV_PG.dat


In [11]:
gto.shape

(269, 300)

In [12]:
print(list(gto.columns))

['Karmn', 'Name', 'Comp', 'GJ', 'RA_J2016_deg', 'DE_J2016_deg', 'RA_J2000', 'DE_J2000', 'l_J2016_deg', 'b_J2016_deg', 'Ref01', 'SpT', 'SpTnum', 'Ref02', 'Teff_K', 'eTeff_K', 'logg', 'elogg', '[Fe/H]', 'e[Fe/H]', 'Ref03', 'L_Lsol', 'eL_Lsol', 'Ref04', 'R_Rsol', 'eR_Rsol', 'Ref05', 'M_Msol', 'eM_Msol', 'Ref06', 'muRA_masa-1', 'emuRA_masa-1', 'muDE_masa-1', 'emuDE_masa-1', 'Ref07', 'pi_mas', 'epi_mas', 'Ref08', 'd_pc', 'ed_pc', 'Ref09', 'Vr_kms-1', 'eVr_kms-1', 'Ref10', 'ruwe', 'Ref11', 'U_kms-1', 'eU_kms-1', 'V_kms-1', 'eV_kms-1', 'W_kms-1', 'eW_kms-1', 'Ref12', 'sa_m/s/a', 'esa_m/s/a', 'Ref13', 'SKG', 'Ref14', 'SKG_lit', 'Ref14_lit', 'Pop', 'Ref15', 'vsini_flag', 'vsini_kms-1', 'evsini_kms-1', 'Ref16', 'P_d', 'eP_d', 'Ref17', 'pEWHalpha_A', 'epEWHalpha_A', 'Ref18', 'log(LHalpha/Lbol)', 'elog(LHalpha/Lbol)', 'Ref19', '1RXS', 'CRT_s-1', 'eCRT_s-1', 'HR1', 'eHR1', 'HR2', 'eHR2', 'Flux_X_E-13_ergcm-2s-1', 'eFlux_X_E-13_ergcm-2s-1', 'LX/LJ', 'eLX/LJ', 'Ref20', 'Activity', 'Ref21', 'FUV_mag',

### Calculate the main sampling parameters for each time series

In [13]:
sampling_df = pd.DataFrame(columns=['Karmn', 'T_obs',
                                    'delta_t', 'f_nyq',
                                    'valid_for_PG'],
                           dtype=float)
sampling_df

Unnamed: 0,Karmn,T_obs,delta_t,f_nyq,valid_for_PG


#### Calculate the observation period for each curve

This could be somehow misleading, as many of the TESS light curves are indeed composed by two periods of continuous data points, with a gap between them, and we are taking the overall observation period, without taking into account those gaps.

However, this approach allows us to be on the safe side when it comes to choose the spacing of the frequency grid for periodogram analysis, because larger periods of observation forces us to choose even  closer frequency spacings.

In [14]:
#for i in range(2, 4): # TEST
for i in range(0, len(gto)):
    karmn = gto.loc[i, 'Karmn']
    try:
        # Read the TESS LC file:
        lc = tess_lc_load(gto.loc[i, 'lc_file'])
        # Calculate the observation time:
        t_obs = lc.time.max() - lc.time.min()
    except:
        t_obs = np.nan
    # Populate a new row:
    # (Is this "pythonic"? Using "loc" for a row that is not there yet?)
    sampling_df.loc[i, 'Karmn'] = karmn
    sampling_df.loc[i, 'T_obs'] = t_obs
    sampling_df.loc[i, 'valid_for_PG'] = (gto.loc[i, 'valid_PG_RV'] == 1) & \
        (gto.loc[i, 'valid_PG_TESS'] == 1)



In [15]:
len(sampling_df)

269

In [16]:
sampling_df.head()

Unnamed: 0,Karmn,T_obs,delta_t,f_nyq,valid_for_PG
0,J23548+385,23.0302046966292,,,True
1,J23505-095,27.405986226444384,,,True
2,J23431+365,22.9884193984758,,,True
3,J23381-162,27.40572921401008,,,True
4,J23245+578,26.48502314532448,,,True


In [17]:
sampling_df.dtypes

Karmn            object
T_obs            object
delta_t         float64
f_nyq           float64
valid_for_PG     object
dtype: object

Convert the time deltas to values:

In [18]:
sampling_df['T_obs'] = sampling_df['T_obs'] \
    .map(lambda x: x.value if isinstance(x, TimeDelta) else float(x))
sampling_df['T_obs'] = sampling_df['T_obs'].astype(float)

In [19]:
sampling_df.dtypes

Karmn            object
T_obs           float64
delta_t         float64
f_nyq           float64
valid_for_PG     object
dtype: object

In [20]:
sampling_df[['T_obs']].describe()

Unnamed: 0,T_obs
count,269.0
mean,24.923213
std,1.626958
min,20.277481
25%,23.807982
50%,24.867013
75%,26.318173
max,27.417075


#### Forcing sampling period to its fixed (and almost unique) value

In this case, as opposed to the RV time series case, we just take a fixed value for the sampling period in TESS light curves, set to $0.001389\;d$, ignoring the very few exceptions to this rule found in that database.

In [21]:
sampling_df['delta_t'] = 0.001389
sampling_df.loc[sampling_df['T_obs'].isna(), 'delta_t'] = np.nan

In [22]:
sampling_df.head()


Unnamed: 0,Karmn,T_obs,delta_t,f_nyq,valid_for_PG
0,J23548+385,23.030205,0.001389,,True
1,J23505-095,27.405986,0.001389,,True
2,J23431+365,22.988419,0.001389,,True
3,J23381-162,27.405729,0.001389,,True
4,J23245+578,26.485023,0.001389,,True


#### Calculate the Nyquist frequencies

In [23]:
sampling_df['f_nyq'] = 1.0 / (2.0 * sampling_df['delta_t'])
sampling_df.head()

Unnamed: 0,Karmn,T_obs,delta_t,f_nyq,valid_for_PG
0,J23548+385,23.030205,0.001389,359.971202,True
1,J23505-095,27.405986,0.001389,359.971202,True
2,J23431+365,22.988419,0.001389,359.971202,True
3,J23381-162,27.405729,0.001389,359.971202,True
4,J23245+578,26.485023,0.001389,359.971202,True


#### Statistics

In [24]:
sampling_df.describe()

Unnamed: 0,T_obs,delta_t,f_nyq
count,269.0,269.0,269.0
mean,24.923213,0.001389,359.9712
std,1.626958,8.038051e-18,1.138987e-13
min,20.277481,0.001389,359.9712
25%,23.807982,0.001389,359.9712
50%,24.867013,0.001389,359.9712
75%,26.318173,0.001389,359.9712
max,27.417075,0.001389,359.9712


In this TESS case, the Nyquist frequency for all the samples is above the expected pulsation frequency range of $f_{Pulsation}\in[8.0,\;72.0]\;d^{-1}$ and even of the frequency range of calculation chosen of $f\in[2.4,\;288.0]\;d^{-1}$.

Hence, TESS curves should be fine to detect the searched for pulsation frequencies, if present. The problem will probably lie with the very low power amplitude of those detected frequencies, which should be solved by the stacked periodogram.

### Calculate the minimum frequency grid spacing

In [25]:
max_T = np.nanmax(sampling_df['T_obs'])
max_T

27.417074908569248

In [26]:
delta_f = 1.0 / (N0 * max_T)
print("Frequency grid spacing must be smaller than %f d^(-1)" %delta_f)

Frequency grid spacing must be smaller than 0.003647 d^(-1)


Hence, about a 0.003 frequency grid spacing should do well:

In [27]:
n = int((288.0 - 2.4) / 0.003)
print("Estimated number of points for each periodogram: %d" %n)

Estimated number of points for each periodogram: 95200


This is $\approx95000$ points for each periodogram, which seems reasonable, althougn in this case of TESS LC, the number of time points is much larger than in the case of RV time series, so calculation will also take time, probably.

### Checking of the Nyquist criteria

As we saw, we assumed the same Nyquist frequency for all the curves.

In [28]:
sampling_df[['f_nyq']].describe()

Unnamed: 0,f_nyq
count,269.0
mean,359.9712
std,1.138987e-13
min,359.9712
25%,359.9712
50%,359.9712
75%,359.9712
max,359.9712


We see that the Nyquist frequency is $\approx359\;d^{-1}$. This Nyquist frequency is above the searched for range of $f\in[288.0, 2.4] d^{-1}$, so ideally _TESS_ light curves should be appropriate to do a periodogram analysis, and could serve as a benchmark for the results of periodogram analysis over the RV time series.

# Summary

**OBSERVATIONS AND CONCLUSIONS:**
- For the TESS LC periodogram calculations, we will set:
  - A lower frequency of $2.4\;d^{-1}$.
  - An upper frequency of $288.0\;d^{-1}$.
  - A grid spacing of $0.003\;d^{-1}$. This would yield a total of 95200 frequency points to calculate for each periodogram.
- Also, we have seen that the Nyquist frequency of $\approx359\;d^{-1}$ is above the frequency range to be considered, so _TESS_ light curves should be more appropriate for a peeriodogram analysis.