In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import scipy.optimize as optimize
import datetime
from scipy.fft import fft, fftfreq

# Project: searching for exoplanets

Recall that in weeks 8 of this course we explored analyzing data in the frequency domain.

This technique is quite powerful, and in fact is the basis of how many searches for exoplanets are done.
In fact, this was the technique used to discover many of the first exoplanets.

This technique is based on the idea that a planet will have a slight effect on its star's orbit.  Recall that in a two body system, the two bodies actually orbit around the center of mass of the combined system.  Even though the star will typically have much more mass that the planet, the center of mass of the combined system will not be exactly at the center of mass of the star.  This will cause the star to "wobble".  In technical terms, we say that the radial velocity of the star will vary. 

<img src="figures/Radial-Velocity-Method-star-orbits.png" width=600>

Then astronomers use the spectral techniques we study in the second notebook of week 6 to measure the relative velocity of a star to extremely high precicsion to detect this wobble.  This variation is actually very small, maybe only dozens of $m / s$ for the first exoplanets discovered, and down to $10 cm / s$ for current state of the art telescopes.

The thing that allows us to be fully confident that we are seeing a real signal is that the wobble is periodic.  Thus, by doing the analysis in the frequency domain, we can confirm the periodic signature of the wobble. 

In this project, you will consider the effect that can make it easier or harder to detect a signal.

1. The quality of the measurements.  We will smear out the measurements of the radial velocity, thus degrading the signal.

2. The number of measurements.  The longer we observe a star for, and the more measurement we take, the more chances we have to confirm the periodic signature in the radial velocity variations.

We have provided a few simple functions, including functions to degrade the date, extract the Fourier transform of the data, and quantify both the size of the signal peak and the amount of background in the Fourier transform.

Here are some potential goals:

1. Run through the four cases in this notebook that demonstrate idealized data, and what happens as it gets degraded.

2. Come up with a way to define the significance of the peak in the Fourier transform and decide if it constitues a discovery.

3. Explore how the significance changes as we degrade the data, and characterize those changes.

4. Come up with a formula to estimate how long we might have to observe to discover an exoplanet if we have a given amount of noise in our measurement of the radial velocity.

Fun fact: the idealized data we have are using the period and radial velocity of 51 pegasi b, the first exoplanet ever discovered around a "normal" or "main sequence" star.

https://en.wikipedia.org/wiki/51_Pegasi


### Function to plot the $v_{\rm rad}$ as a function of time.

In [None]:
def plot_rvel(date, rvel):
    _ = plt.scatter(date, rvel)
    _ = plt.xlabel(r'Observation Time [days]')
    _ = plt.ylabel(r'$\Delta v_{\rm rad} [\frac{m}{s}]$')

### Function to degrade data by adding noise and / or reducing number of observations

In [None]:
def degrade_data(date, rvel, noise_scale=0.01, tfrac=1.):
    tmax = tfrac*np.max(date)
    mask = date < tmax
    return (date[mask], rvel[mask] + stats.norm(loc=0, scale=noise_scale).rvs(size=mask.sum()))

### Function to take the Fourier transform of a time series

In [None]:
def do_fft(date, rvel, plot=True):
    # Number of sample points
    N = len(date)

    # Arbitrary offset in data
    offset = np.mean(rvel)

    # sample spacing
    T = np.mean(date[1:] - date[0:-1])

    yf = fft(rvel-offset)
    xf = fftfreq(N, T)[:N//2]

    if plot:
        _ = plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))    
        _ = plt.ylabel("Signal at Frequency [a.u.]")
        _ = plt.xlabel(r"Frequency [${\rm days}^{-1}$]")

        freq_max = xf[np.argmax(np.abs(yf[0:N//2]))]
        period = 1./freq_max

        _ = plt.annotate(r"$f \sim %0.2f {\rm days}^{-1}$" % freq_max, (2.0, 35))
        _ = plt.annotate(r"$P \sim %0.2f {\rm days}$" % period, (2.0, 32))
        
    return xf, 2.0/N * np.abs(yf[0:N//2])

### Function to esimate the peak in the Fourier transform, and also to esimate the noise level

In [None]:
def fft_noise_stats(xf, yf):
    min_bin = np.searchsorted(xf, 0.5)
    other_data = yf[min_bin:]
    mean = np.mean(other_data)
    std = np.std(other_data)
    peak = np.max(yf)
    print("The mean and standard deviation of the FFT away from the peak is %0.2f %0.2f" % (mean, std))
    print("The value at the mean is %.2f" % peak)

# Case 1: using idealized data

This case represents some very idealized measurements.  There is no instrumental error, and we observe the star a few times a day for 40 days.  This results in a very clear and convincing signal.

In [None]:
data = np.loadtxt(open("../data/51peg_model_rvs.txt", 'rb'), usecols=range(2))

# This is how we pull out the data from columns in the array.
date = data[:,0] - np.min(data[:,0])
rvel = data[:,1]

plot_rvel(date, rvel)

In [None]:
xf, yf = do_fft(date, rvel)
fft_noise_stats(xf, yf)

# Case 2: shorter observation, but still idealized

In this case we still have no measurment error, but we only observed for 20 days.  We still get a very nice clear signal.

In [None]:
less_data = degrade_data(date, rvel, tfrac=0.5)
plot_rvel(less_data[0], less_data[1])

In [None]:
xf, yf = do_fft(less_data[0], less_data[1])
fft_noise_stats(xf, yf)

# Case 3, full observation time, but noisy data

In this case we have about $100 \frac{m}{s}$ of noise in the measurements of $v_\mathrm{rad}$.  Even though we can't really see a signal in the time series, we can see a pretty clear signal in the Fourier transform.

In [None]:
noisy_data = degrade_data(date, rvel, noise_scale=100.)
plot_rvel(noisy_data[0], noisy_data[1])

In [None]:
xf, yf = do_fft(noisy_data[0], noisy_data[1])
fft_noise_stats(xf, yf)

# Case 4, less observation time, and noisy data

In this case we have both noise measurement and a shorter observation, and the signal is getting really marginal.

In [None]:
noisy_short_data = degrade_data(date, rvel, noise_scale=100., tfrac=0.5)
plot_rvel(noisy_short_data[0], noisy_short_data[1])

In [None]:
xf, yf = do_fft(noisy_short_data[0], noisy_short_data[1])
fft_noise_stats(xf, yf)