# Time series analysis in unevenly-spaced datasets

In this assignment, we're going to be working with the same time series dataset of a Galactic [low-mass x-ray binary](https://en.wikipedia.org/wiki/X-ray_binary#Low-mass_X-ray_binary), GX 5-1 ([Jonker et al. 2002](http://adsabs.harvard.edu/abs/2002MNRAS.333..665J)), as we did in the pre-class assignment.  The difference is that in class today we'll be working with the [Lomb-Scargle Periodogram](https://en.wikipedia.org/wiki/Least-squares_spectral_analysis#The_Lomb.E2.80.93Scargle_periodogram), which can be used to extract information from irregularly-spaced time series.

Some additional information on the Lomb-Scargle periodogram:

* ["Understanding the Lomb-Scargle Periodogram"](https://arxiv.org/abs/1703.09824) - a monograph by [Jake VanderPlas](http://jakevdp.github.io/)  ([accompanying blog post](http://jakevdp.github.io/blog/2017/03/30/practical-lomb-scargle/); [accompanying source code repository](https://github.com/jakevdp/PracticalLombScargle/); [example of usage](http://jakevdp.github.io/blog/2015/06/13/lomb-scargle-in-python/))

In [None]:
import numpy as np
import numpy.random as npr
%matplotlib inline
import matplotlib.pyplot as plt

# get the data!
counts = np.loadtxt("GX.dat")
print("original array shape:", counts.shape)

# the array is the wrong shape because the data is structured oddly.  The counts
# are in order in time, and meant to be read left to right, top to bottom.  (So
# each row is contiguous in time, and the following row comes after it in time.)
# we can sort this out by reshaping the array to be 1D
counts_regular = np.reshape(counts,counts.size,order='C')
print("NEW array shape:     ", counts_regular.shape)

# make an array of times.  This uses the same dataset, which is sampled every
# 1/128 seconds for a total of 512 seconds.
times_regular = np.arange(0.0,counts_regular.size/128.0,1.0/128.0)

Before we do anything with Lomb-Scargle, let's first take a quick look at a FFT of the regularly-spaced data to see what to expect:

In [None]:
import scipy.fftpack

# do a FFT assuming everything is real
yfr = scipy.fftpack.rfft(counts_regular)

# frequency bins (remember, rfft returns an array of the same length you put in,
# but alternates real/imaginary components so we only want a frequency array that's half
# of the size of the counts array)
dt = 1./128.
xf = np.linspace(0.0, 1.0/(2.0*dt), counts_regular.size//2)

# only plotting the real part part of the array returned by rfft
plt.plot(xf,2.0/counts_regular.size * np.abs(yfr[1::2]),'b-')
plt.xlabel('frequency [Hz]')
plt.ylabel('power [arbitrary]')
plt.title('FFT of count rates')


Now, let's make the dataset irregularly spaced in time!  We're going to randomly subsample the data, and then we're also going to mask out windows of data (because, e.g., we can't observe many things during the daytime with an optical telescope).  The total amount of data that we're going to keep is 0.5 * frac, since half of the data will be thrown away by the windowing.

In [None]:
frac = 0.5  # fraction of data to keep (before windowing)
window = 256  # size of windows to mask out 

npr.seed(8675309)  # choose a seed to ensure reproducibility

# create a boolean array where only frac of the data is 'True'.
bool_array = npr.rand(counts_regular.size) < frac

# we are going to make half of the boolean array false by reshaping it into a 2D array, zeroing
# out every other row in the zero array, and then reshaping it back to a 1D array.  This is sort
# of complicated, but the most compact way of doing this (otherwise we'd have to do a loop)
bool_array = bool_array.reshape((bool_array.size//window,window))
bool_array[::2,:]=False
bool_array = bool_array.reshape((bool_array.size))

# now we generate a set of irregularly-spaced counts.
counts_irr = counts_regular[bool_array]
times_irr  = times_regular[bool_array]

# plot it out!
# subsampled data is in blue, the original dataset is in red.
# note we're only looking at a small chunk of the data, because it's actually quite a long time series
plt.plot(times_regular,counts_regular,'r.')
plt.plot(times_irr,counts_irr,'bo',alpha=0.5)
plt.xlim(0,20)
plt.xlabel('Time [s]')
plt.ylabel('Counts/bin')
plt.title('Subsampled data [only 20 s shown]')

## Lomb-Scargle

Now, implement your own version of the Lomb-Scargle Periodogram, Feigelson equations 11.36 and 11.37.  **Note that there is a typo in the numerator of equation 11.36** - the X$_i$ terms should be normalized or you'll get some odd effects:  $\tilde{X}_i =  X_i - \bar{X}$.  VanderPlas also shifts the times in the numerator using $\tau(\nu)$ ($t_i \rightarrow t_i - \tau(\nu)$).  This is optional, but recommended.

Plot the results for a range of frequencies corresponding to the range of frequencies shown in the FFT above (say for frequencies between 1 and 60 Hz in steps of $\Delta f = 0.5$ Hz).


### Now check your results!  

[Astropy](http://www.astropy.org/) is a python package that contains a wide variety of extremely useful astronomy-related Python tools.  It happens to include an [implementation of the Lomb-Scargle periodogram](https://docs.astropy.org/en/stable/timeseries/lombscargle.html) that is probably more efficient than the one we are writing in class, and it's also a good sanity check on your results.  

### Questions

1. How well does the Lomb-Scargle result agree with the FFT we did at the beginning of the notebook?
2. Does your result change as you change the subsampling (the variable ```frac```) and the size of the window where data is removed (the variable ```window```)?  If so, in what way does the result change?

*Put your answers here!*