In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import emcee
import george
from george import kernels
import corner

%matplotlib notebook

# Non-parametric Measures of Periodicity – Return of the Gaps

**Version 0.1**

* * *

By AA Miller (Northwester/CIERA)  
20 Sep 2021

In this lecture we will examine non-parametric methods to search for periodic signals in astronomical time series. Earlier this session, we focused extensively the Lomb-Scargle periodogram. LS is the "standard" in astronomy, in part because it was the first (good) method developed for noisy and sparse data.

LS is not without warts, however, (i) LS does not handle outliers well, and (ii) LS works best on purely sinusoidal signals.

Given non-Gaussian noise and that some signals (e.g., transiting planet) are not sinusoidal, we will now explore alternative methods to search for periodicity.

## Problem 1) Helper Functions

We need re-use our lectures from Lecture III.

**Problem 1a**

Create a function, `gen_periodic_data`, that creates simulated data (including noise) over a grid of user supplied positions:

$$ y = A\,cos\left(\frac{x}{P} - \phi\right) + \sigma_y$$

where $A, P, \phi$ are inputs to the function. `gen_periodic_data` should include Gaussian noise, $\sigma_y$, for each output $y_i$.

In [2]:
def gen_periodic_data(x, period=1, amplitude=1, phase=0, noise=0):
    '''Generate periodic data given the function inputs
    
    y = A*cos(x/p - phase) + noise
    
    Parameters
    ----------
    x : array-like
        input values to evaluate the array
    
    period : float (default=1)
        period of the periodic signal
    
    amplitude : float (default=1)
        amplitude of the periodic signal
    
    phase : float (default=0)
        phase offset of the periodic signal
    
    noise : float (default=0)
        variance of the noise term added to the periodic signal
    
    Returns
    -------
    y : array-like
        Periodic signal evaluated at all points x
    '''
    
    y = amplitude*np.sin(2*np.pi*x/(period) - phase) + np.random.normal(0, np.sqrt(noise), size=len(x))
    return y

**Problem 1b**

Create a function, `phase_plot`, that takes x, y, and $P$ as inputs to create a phase-folded light curve (i.e., plot the data at their respective phase values given the period $P$).

Include an optional argument, `y_unc`, to include uncertainties on the `y` values, when available.

In [3]:
def phase_plot(x, y, period, y_unc = 0.0):
    '''Create phase-folded plot of input data x, y
    
    Parameters
    ----------
    x : array-like
        data values along abscissa

    y : array-like
        data values along ordinate

    period : float
        period to fold the data
        
    y_unc : array-like
        uncertainty of the 
    '''    
    phases = (x/period) % 1
    if type(y_unc) == float:
        y_unc = np.zeros_like(x)
        
    plot_order = np.argsort(phases)
    fig, ax = plt.subplots()
    ax.errorbar(phases[plot_order], y[plot_order], y_unc[plot_order],
                 fmt='o', mec="0.2", mew=0.1)
    ax.set_xlabel("phase")
    ax.set_ylabel("signal")
    fig.tight_layout()

**Problem 1c**

Write a function `plot_chains` to show the individual chains from the multi-chain MCMC sampler `emcee`.

In [37]:
#define function to plot walker chains  
def plotChains(sampler, nburn, paramsNames, nplot=nwalkers):
    Nparams = len(paramsNames)
    fig, ax = plt.subplots(Nparams+1,1, figsize = (8,2*(Nparams+1)), sharex = True)
    fig.subplots_adjust(hspace = 0)
    ax[0].set_title('Chains')
    xplot = np.arange(sampler.get_chain().shape[0])

    selected_walkers = np.random.choice(range(sampler.get_chain().shape[1]), nplot, replace=False)
    for i,p in enumerate(paramsNames):
        for w in selected_walkers:
            burn = ax[i].plot(xplot[:nburn], sampler.get_chain()[:nburn,w,i], 
                              alpha = 0.4, lw = 0.7, zorder = 1)
            ax[i].plot(xplot[nburn:], sampler.get_chain(discard=nburn)[:,w,i], 
                       color=burn[0].get_color(), alpha = 0.8, lw = 0.7, zorder = 1)
            
            ax[i].set_ylabel(p)
            if i==Nparams-1:
                ax[i+1].plot(xplot[:nburn], sampler.get_log_prob()[:nburn,w], 
                             color=burn[0].get_color(), alpha = 0.4, lw = 0.7, zorder = 1)
                ax[i+1].plot(xplot[nburn:], sampler.get_log_prob(discard=nburn)[:,w], 
                             color=burn[0].get_color(), alpha = 0.8, lw = 0.7, zorder = 1)
                ax[i+1].set_ylabel('ln P')
            
    return ax

paramsNames=['$a_g$', '$h_{1,g}$', '$t_{1}$', r'$\alpha_{1,g}$', '$h_{2,g}$', '$t_{2}$', r'$\alpha_{2,g}$']
gr_paramsNames=[r'$t_\mathrm{fl}$', 
             '$a_g$', r"$A'_g$", r'$\alpha_{r, g}$', 
             '$a_r$', r"$A'_r$", r'$\alpha_{r, r}$']

## Problem 2) String Length

The string length method ([Dworetsky](http://adsabs.harvard.edu/abs/1983MNRAS.203..917D)), phase folds the data at trial periods and then minimizes the distance to connect the phase-ordered observations.

<img style="display: block; margin-left: auto; margin-right: auto" src="./images/StringLength.png" align="middle">

<div align="right"> <font size="-3">(credit: Gaveen Freer - http://slideplayer.com/slide/4212629/#) </font></div>

**Problem 2a**

Simulate a light curve with 47 observations over a duration of 61 d, $P = 0.7\,\mathrm{d}$, $A = 2$, and variance of the noise 0.01.

In [4]:
t_obs = np.random.uniform(0,61,size=47)
y = gen_periodic_data(t_obs, period=0.7, amplitude=2, phase=0, noise=0.01)

phase_plot(t_obs, y, 0.7, y_unc = 0.1*np.ones_like(y))

<IPython.core.display.Javascript object>

**Probelm 2b**

Write a function, `calc_string_length`, that calculates the string length for a phase-folded light curve with observations `x`, `y`, and frequency `f`.

In [5]:
def calc_string_length(x, y, f=1):
    '''Calculate string length for observations at frequency f
    
    Parameters
    ----------
    x : array-like
        input time of observations
    
    y : array-like
        measured signal at input x
    
    f : float (default=1)
        frequency of the test period
        
    Returns
    -------
    sl : float
        String length for the phase-ordered observations
    
    '''
    
    phases = x*f % 1
    sl = np.sum(np.hypot(np.diff(np.sort(phases)), np.diff(y[np.argsort(phases)])))
    return sl

**Problem 2c** 

Write a function `sl_periodogram` to measure the string length for input data 'x', 'y', over a frequency grid `f_grid'.

In [6]:
def sl_periodogram(x, y, f_grid = np.linspace(0.1,10,10)):
    '''Calculate the string length "periodogram"
    
    Parameters
    ----------
    x : array-like
        input time of observations
    
    y : array-like
        measured signal at input x
    
    f_grid : array_like (default=np.linspace(0.1,10,10))
        frequency grid for the period
        
    Returns
    -------
    sl_psd : array_like
        String length at every test frequency f
    
    '''
    
    sl_psd = np.zeros_like(f_grid)
    for f_num, f in enumerate(f_grid):
        sl_psd[f_num] = calc_string_length(x,y,f)
    
    return sl_psd

**Problem 2d**

Plot the string length periodogram for the simulated data. Does it make sense?

*Hint - think about the optimal grid from Notebook III*

In [11]:
f_grid = np.arange(1/np.ptp(t_obs), 10, 1/5/np.ptp(t_obs))
sl_psd = sl_periodogram(t_obs, y, f_grid)

fig, ax = plt.subplots()
ax.plot(1/f_grid, sl_psd, '0.2', lw=2)
ax.set_xlabel('Period (d)')
ax.set_ylabel('String Length')

ax.axline((0.7,np.mean(sl_psd)), (0.7, np.mean(sl_psd)+1e-3), 
          color='DarkOrange',  lw=1, ls='--')

axins = plt.axes([.29, .22, .65, .27])
axins.plot(1/f_grid, sl_psd, '0.2', lw=2)
axins.axline((0.7,np.mean(sl_psd)), (0.7, np.mean(sl_psd)+1e-3), 
          color='DarkOrange',  lw=1, ls='--')
axins.set_xlim(0,3)

fig.tight_layout()

<IPython.core.display.Javascript object>



## Problem 3) Phase Dispersion Minimization

In [8]:
def calc_pdm(x, y, f=1, bins=10):
    '''Calculate the phase dispersion minimization for observations at frequency f
    
    Parameters
    ----------
    x : array-like
        input time of observations
    
    y : array-like
        measured signal at input x
    
    f : float (default=1)
        frequency of the test period
    
    bins : int (default=10)
        
    Returns
    -------
    pdm : float
        the sum of the scatter in each bin
    
    '''
    
    phases = x*f % 1
    pdm = 0
    for bin_num in range(10):
        this_bin = np.where((phases >= bin_num/10) & 
                            (phases < (bin_num+1)/10))
        if len(this_bin[0]) > 1:
            pdm += np.std(y[this_bin], ddof=1)/bins
    return pdm

In [9]:
def pdm_periodogram(x, y, f_grid = np.linspace(0.1,10,10), **kwargs):
    '''Calculate the phase dispersion minimization "periodogram"
    
    Parameters
    ----------
    x : array-like
        input time of observations
    
    y : array-like
        measured signal at input x
    
    f_grid : array_like (default=np.linspace(0.1,10,10))
        frequency grid for the period
        
    Returns
    -------
    pdm_psd : array_like
        PDM at every test frequency f
    
    '''
    
    pdm_psd = np.zeros_like(f_grid)
    total_rms = np.std(y, ddof=1)
    for f_num, f in enumerate(f_grid):
        pdm_psd[f_num] = calc_pdm(x,y,f)/total_rms
    
    return pdm_psd

In [10]:
f_grid = np.arange(1/np.ptp(t_obs), 10, 1/5/np.ptp(t_obs))
pdm_psd = pdm_periodogram(t_obs, y, f_grid)

fig, ax = plt.subplots()
ax.plot(1/f_grid, pdm_psd, '0.2', lw=2)
ax.set_xlabel('Period (d)')
ax.set_ylabel('PDM statistic')
ax.axline((0.7,np.mean(pdm_psd)), (0.7, np.mean(pdm_psd)+1e-3), 
          color='DarkOrange',  lw=1, ls='--')


axins = plt.axes([.29, .22, .65, .27])
axins.plot(1/f_grid, pdm_psd, '0.2', lw=2)
axins.axline((0.7,np.mean(pdm_psd)), (0.7, np.mean(pdm_psd)+1e-3), 
          color='DarkOrange',  lw=1, ls='--')
axins.set_xlim(0,3)
fig.tight_layout()

print(f'The best-fit period is {1/f_grid[np.argmin(pdm_psd)]:.4f} d')

<IPython.core.display.Javascript object>

The best-fit period is 0.6994 d




## Problem 4) Analysis of Variance

Analysis of Variance (AOV; [Schwarzenberg-Czerny 1989](http://adsabs.harvard.edu/abs/1989MNRAS.241..153S)) is similar to PDM. Optimal periods are defined via hypothesis testing, and these methods are found to perform best for certain types of astronomical signals.

## Problem 5) Supersmoother

Supersmoother ([Reimann](http://adsabs.harvard.edu/abs/1994PhDT........20R)) is a least-squares approach wherein a flexible, non-parametric model is fit to the folded observations at many trial frequncies. The use of this flexible model reduces aliasing issues relative to models that assume a sinusoidal shape, however, this comes at the cost of requiring considerable computational time. 

Briefly, supersmoother provides a smooth estimate of the data via localized linear regression. Observations are then compared to the smooth model value to identify the model that optimally reduces the sum of the square of the residuals. 

This approach requires the use of some adopted span, essentially the region over which the linear fit is performed. The "magic" in supersmoother is that it uses cross-validation to identify the optimal span at every location within the data set.

The psuedo-code procedure is:
  1. create 3 smooth local linear estimations of value of x with span = 0.05, 0.2, and 0.5
  2. identify best span at every position, x, based on residuals
  3. smooth the best span curve from (2) with span = 0.2
  4. create a "final" smooth estimate by interpolating bewtween two smooth curves closest in value to (3)

In [19]:
np.random.seed(185)
t_obs = np.random.uniform(60, size=120)
phi = np.pi
var_y = 0.81
y = gen_periodic_data(t_obs, period=0.7, amplitude=3, phase=phi, noise=var_y) 
y += gen_periodic_data(t_obs, period=0.7/2, amplitude=2, phase=phi+np.pi/2)
y += gen_periodic_data(t_obs, period=0.7/3, amplitude=1, phase=phi)
y_unc = np.ones_like(y)*np.sqrt(var_y)
phase_plot(t_obs, y, 0.7, y_unc)

<IPython.core.display.Javascript object>

**Problem 5a**

Write a function `smooth` that estimates the value of `y` at every phase `phase` via a linear least squares fit to all the observations within $\pm$`span`/2 of `phase`. The observed value of `y` at phase `phase` should be excluded from the fit. 

*Hint* - it may be helpful to input `x` and `f` so the phase can be calculated within the function.

In [59]:
def smooth(y, x, f, span=0.05, y_unc=None):
    '''Calculate the smooth
    
    Parameters
    ----------
    x : array-like
        input time of observations
    
    y : array-like
        measured signal at input x
    
    f : float 
        frequency for which to calculate the smooth
        
    Returns
    -------
    smooth : array_like
        smooth estimate of the phase folded frequency
    '''
    
    if type(y_unc) == int:
        y_unc = np.ones_like(y)*y_unc
        
    phases = (x*f) % 1
    smooth = np.empty_like(x)
    for obs_num, phase in enumerate(phases):
        this_fit = np.where((phases >= phase - span/2) & 
                            (phases < phase + span/2) & 
                            (phases != phase) & 
                            # Kludge for numerical stability
                            (len(np.where((phases >= phase - span/2) & 
                                          (phases < phase))[0]) > 0
                            ) & 
                            # ask about this during Q&A
                            (len(np.where((phases < phase + span/2) & 
                                          (phases > phase))[0]) > 0
                            )
                           )
        # catch instances where there is not enough data to fit a line
        if len(this_fit[0]) > 1:
            if y_unc is not None:
                lin_fit = np.poly1d(np.polyfit(phases[this_fit],
                                               y[this_fit],
                                               1, 
                                               w = 1/y_unc[this_fit]))
            else:
                lin_fit = np.poly1d(np.polyfit(phases[this_fit],
                                               y[this_fit],
                                               1))
            smooth[obs_num] = lin_fit(phase)
        else:
            smooth[obs_num] = -np.inf

        
    # use linear interpolation to fill in missing smooth
    missing_smooth = np.isinf(smooth)
    smooth[missing_smooth] = np.interp(phases[missing_smooth], 
                                       np.sort(phases[~missing_smooth]), 
                                       smooth[~missing_smooth][np.argsort(phases[~missing_smooth])])

        
    return smooth

**Problem 5b**

Plot the smooth representation of the data with spans of 0.05, 0.2, and 0.5 folded at a period of 0.7 d. 

In [74]:
y_unc = 0.1*np.ones_like(y)
phases = (t_obs/0.7) % 1

smooth_tweeter = smooth(y, t_obs, 1/0.7, span=0.05, y_unc=y_unc)
smooth_midrange = smooth(y, t_obs, 1/0.7, span=0.2, y_unc=y_unc)
smooth_woofer = smooth(y, t_obs, 1/0.7, span=0.5, y_unc=y_unc)

phase_plot(t_obs, y, 0.7, y_unc = 0.1*np.ones_like(y))
plt.plot(np.sort(phases), smooth_tweeter[np.argsort(phases)], 
       label="span = 0.05")
plt.plot(np.sort(phases), smooth_midrange[np.argsort(phases)], 
       label="span = 0.2")
plt.plot(np.sort(phases), smooth_woofer[np.argsort(phases)], 
       label="span = 0.5")
plt.legend()

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7fbb41951650>

**Probem 5c**

Identify the best span at every phase via the residuals. 

In [75]:
smooth_list = np.vstack([[smooth_tweeter], 
                         [smooth_midrange],
                         [smooth_woofer]])
span_list = np.vstack([[np.ones_like(smooth_tweeter)*0.05], 
               [np.ones_like(smooth_midrange)*0.2],
               [np.ones_like(smooth_woofer)*0.5]])
resid = np.abs(y - smooth_list)

In [76]:
span_midrange = smooth(span_list[np.argmin(resid, axis=0), 0], t_obs, 1/0.7, span=0.2)

In [77]:
plt.figure()
plt.plot(np.sort(phases), span_midrange[np.argsort(phases)])

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x7fbb42a95c50>]

In [78]:
supersmooth = np.empty_like(smooth_midrange)
for sm_num, sm in enumerate(span_midrange):
    supersmooth[sm_num] = np.interp(sm, 
                                    span_list.T[sm_num], 
                                    smooth_list.T[sm_num])

In [79]:
y_unc = 0.1*np.ones_like(y)
phases = (t_obs/0.7) % 1

smooth_tweeter = smooth(y, t_obs, 1/0.7, span=0.05, y_unc=y_unc)
smooth_midrange = smooth(y, t_obs, 1/0.7, span=0.2, y_unc=y_unc)
smooth_woofer = smooth(y, t_obs, 1/0.7, span=0.5, y_unc=y_unc)

phase_plot(t_obs, y, 0.7, y_unc = 0.1*np.ones_like(y))
plt.plot(np.sort(phases), smooth_tweeter[np.argsort(phases)], 
       label="span = 0.05")
plt.plot(np.sort(phases), smooth_midrange[np.argsort(phases)], 
       label="span = 0.2")
plt.plot(np.sort(phases), smooth_woofer[np.argsort(phases)], 
       label="span = 0.5")
plt.plot(np.sort(phases), supersmooth[np.argsort(phases)], lw=5,
         color='k', zorder=10, 
         label='supersmooth')
plt.legend()

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x7fbb4194e390>

In [100]:
def calc_supersmooth(y, x, f, spans=[0.05, 0.2, 0.5], y_unc=None):
    #TODO - accept list of spans as argument
    phases = (x*f) % 1

    smooth_list = np.vstack([[smooth(y, t_obs, f, span=s, y_unc=y_unc) for s in spans]])
    span_list = np.ones_like(s_list)*np.array(spans)[:,None]
    
    resid = np.abs(y - smooth_list)
    
    span_midrange = smooth(span_list[np.argmin(resid, axis=0), 0], 
                           x, f, span=np.median(spans))
    
    supersmooth = np.empty_like(smooth_midrange)
    for sm_num, sm in enumerate(span_midrange):
        supersmooth[sm_num] = np.interp(sm, 
                                        span_list.T[sm_num], 
                                        smooth_list.T[sm_num])
    
    return supersmooth

In [101]:
def supersmooth_periodogram(y, y_unc, x, f_grid):
    psd = np.empty_like(f_grid)
    chi2_0 = np.sum(((y - np.mean(y))/y_unc)**2)
    
    for f_num, f in enumerate(f_grid):
        supersmooth = calc_supersmooth(y, x, f, y_unc=y_unc)
        chi2 = np.sum((y - supersmooth)**2/y_unc**2)
        psd[f_num] = 0.5*(chi2_0 - chi2)
    
    return psd

In [109]:
f_grid = np.linspace(0.6, 3, 300)

ss_psd = supersmooth_periodogram(y, y_unc*8.1, t_obs, f_grid)

plt.figure()
plt.plot(1/f_grid, ss_psd)
plt.tight_layout()

<IPython.core.display.Javascript object>

Fortunately, Jake VanderPlas has created a python implementation [SuperSmoother](https://www.astroml.org/gatspy/periodic/supersmoother.html) if you are interested in implementing this method. 

## Problem 6) Bayesian Methods

There have been some efforts to frame the period-finding problem in a Bayesian framework. [Bretthorst 1988](https://www.springer.com/us/book/9780387968711) developed Bayesian generalized LS models, while [Gregory & Loredo 1992](http://adsabs.harvard.edu/abs/1992ApJ...398..146G) applied Bayesian techniques to phase-binned models. 

More recently, efforts to use Gaussian processes (GPs) to model and extract a period from the light curve have been developed ([Wang et al. 2012](http://adsabs.harvard.edu/abs/2012ApJ...756...67W)). These methods have proved to be especially useful for detecting stellar rotation in Kepler light curves ([Angus et al. 2018](http://adsabs.harvard.edu/abs/2018MNRAS.474.2094A)). 


## Problem 5) The Quasi-Periodic Kernel

As we saw in the first lecture, there are many sources with periodic light curves that are not strictly sinusoidal. Thus, the use of the cosine kernel (on its own) may not be sufficient to model the signal. As Suzanne told us during session, the quasi-period kernel: 

$$K_{ij} = k(x_i - x_j) = \exp \left(-\Gamma \sin^2\left[\frac{\pi}{P} \left|x_i - x_j\right|\right]\right)$$

is useful for non-sinusoidal signals. We will now use this kernel to model the variations in the simulated data.

In [5]:
lc = pd.read_csv("example_asas_lc.dat")

**Problem 5a**

Write a function `lnprob3` to calculate log posterior given model parameters $\theta$ and data `x, y, dy`.

*Hint* - it may be useful to write this out as multiple functions.

In [19]:
def model(theta, t):
    _, _, b, _ = theta
    return b

def lnlike(theta, t, y, yerr):
    lnper, lna, b, lngamma = theta
    gp = george.GP(np.exp(lna) * kernels.ExpSine2Kernel(np.exp(lngamma), lnper))  # need to check not exponential per
    gp.compute(t, yerr)
    return gp.lnlikelihood(y - model(theta, t), quiet=True)

def lnprior(theta):
    lnper, lna, b, lngamma = theta
    if (-20 < lna < 20 and 
        -20 < b < 20 and 
        -20 < lngamma < 20 and
        -10 < lnper < np.log(10)):
        return 0.0
    return -np.inf

def lnprob(p, x, y, yerr):
    lp = lnprior(p)
    return lp + lnlike(p, x, y, yerr) if np.isfinite(lp) else -np.inf

In [20]:
def lnprior(theta):
    lnper, lna, b, lngamma = theta
    if (-20 < lna < 20 and 
        0 < b < 20 and 
        -20 < lngamma < 20 and
        -10 < lnper < np.log(np.ptp(lc['hjd']))):
        return 0.0
    return -np.inf

Because we have no idea where to initialize our walkers in this case, we are going to use an ad hoc common sense + brute force approach. 

**Problem 6c**

Run `LombScarge` on the data and determine the top three peaks in the periodogram. Set `nterms` = 2, and the maximum frequency to 5 (this is arbitrary but sufficient in this case).

*Hint* - you may need to search more than the top 3 periodogram values to find the 3 peaks.

In [21]:
from astropy.timeseries import LombScargle

frequency, power = LombScargle(lc['hjd'], lc['mag'], lc['mag_unc'], nterms=2).autopower(maximum_frequency=5)

print('Top LS period is {}'.format(1/frequency[np.argmax(power)]))
print(1/frequency[np.argsort(power)[::-1][0:5]])

Top LS period is 0.7350856847545221
[0.73508568 0.36753832 0.36754736 0.2117088  0.2117058 ]


**Problem 6d**

Initialize one third of your 150 walkers around each of the periods identified in the previous problem. 

Run the MCMC for 500 steps following this initialization.

In [43]:
initial1 = np.array([np.log(0.735), 1, 10, 1])
ndim = len(initial1)
nwalkers = 50
p1 = [np.array(initial1) + 1e-4 * np.random.randn(ndim)
      for i in range(nwalkers)]

initial2 = np.array([np.log(0.367), 1, 10, 1])
ndim = len(initial2)
nwalkers = 50
p2 = [np.array(initial2) + 1e-4 * np.random.randn(ndim)
      for i in range(nwalkers)]

initial3 = np.array([np.log(0.211), 1, 10, 1])
ndim = len(initial3)
nwalkers = 50
p3 = [np.array(initial3) + 1e-4 * np.random.randn(ndim)
      for i in range(nwalkers)]
p0 = p1+p2+p3

nwalkers = len(p0)


In [10]:
# ncores = 8

# sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob3, args=(lc['hjd'],lc['mag'],lc['mag_unc']), threads = ncores)


NameError: name 'nwalkers' is not defined

In [23]:
ncores = 8
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, 
                                args=(lc['hjd'],lc['mag'],lc['mag_unc']), 
                                threads=ncores)


In [44]:
filename = 'tmp.h5'
backend = emcee.backends.HDFBackend(filename)
backend.reset(nwalkers, ndim)        

In [45]:
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, 
                                args=(lc['hjd'],lc['mag'],lc['mag_unc']), 
                                threads=1, backend=backend)
for sample in sampler.sample(p0, iterations=500, thin_by=1, progress=True):
    continue

100%|████████████████████████████| 500/500 [07:35<00:00,  1.10it/s]


import time
import numpy as np


def log_prob(theta):
    t = time.time() + np.random.uniform(0.005, 0.008)
    while True:
        if time.time() >= t:
            break
    return -0.5 * np.sum(theta ** 2)

np.random.seed(42)
initial = np.random.randn(32, 5)
nwalkers, ndim = initial.shape
nsteps = 100


In [35]:
import os

os.environ["OMP_NUM_THREADS"] = "1"

In [36]:
with Pool() as pool:
    sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pool)
    start = time.time()
    sampler.run_mcmc(initial, nsteps, progress=True)
    end = time.time()
    multi_time = end - start
    print("Multiprocessing took {0:.1f} seconds".format(multi_time))
    print("{0:.1f} times faster than serial".format(serial_time / multi_time))


Process SpawnPoolWorker-182:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/queues.py", line 358, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'log_prob' on <module '__main__' (built-in)>
Process SpawnPoolWorker-183:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run

Process SpawnPoolWorker-202:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/queues.py", line 358, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'log_prob' on <module '__main__' (built-in)>
Process SpawnPoolWorker-199:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run

Process SpawnPoolWorker-208:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/queues.py", line 358, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'log_prob' on <module '__main__' (built-in)>
Process SpawnPoolWorker-220:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run

KeyboardInterrupt: 

In [25]:
from multiprocessing import Pool

In [30]:
with Pool() as pool:
    sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, 
                                args=(lc['hjd'],lc['mag'],lc['mag_unc']), pool=pool)
    for sample in sampler.sample(p0, iterations=500, thin_by=1, progress=True):
        continue

Process SpawnPoolWorker-104:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/queues.py", line 358, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'lnprob' on <module '__main__' (built-in)>
Process SpawnPoolWorker-105:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run
 

Process SpawnPoolWorker-120:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/queues.py", line 358, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'lnprob' on <module '__main__' (built-in)>
Process SpawnPoolWorker-121:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run
 

Process SpawnPoolWorker-139:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Process SpawnPoolWorker-141:
Traceback (most recent call last):
  File "/Users/adamamiller/miniconda3/envs/DSFP/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/adamamiller/minic

KeyboardInterrupt: 

**Problem 6e**

Plot the chains.

In [46]:
paramsNames = ['ln(P)', 'ln(a)', 'b', '$ln(\gamma)$']
nburn = 350
plotChains(sampler, nburn, paramsNames)

<IPython.core.display.Javascript object>

array([<AxesSubplot:title={'center':'Chains'}, ylabel='ln(P)'>,
       <AxesSubplot:ylabel='ln(a)'>, <AxesSubplot:ylabel='b'>,
       <AxesSubplot:ylabel='$ln(\\gamma)$'>, <AxesSubplot:ylabel='ln P'>],
      dtype=object)

**Problem 6f** 

Plot $\ln P$ vs. log posterior. 

In [47]:
chain_lnp_end = sampler.chain[:,-1,0]
chain_lnprob_end = sampler.lnprobability[:,-1]
fig, ax = plt.subplots()
ax.scatter(chain_lnp_end, chain_lnprob_end, alpha=0.1)
ax.set_xlabel('ln(P)')
ax.set_ylabel('ln(Probability)')
fig.tight_layout()

<IPython.core.display.Javascript object>

**Problem 6g**

Reinitialize the walkers around the previous walker with the maximum posterior value. 

Run the MCMC for 500 steps. Plot the chains. Have they converged?

In [48]:
p = p0[np.argmax(chain_lnprob_end)]
sampler.reset()

In [49]:
p0 = [p + 1e-8 * np.random.randn(ndim) for i in range(nwalkers)]
sampler.reset()
for sample in sampler.sample(p0, iterations=500, thin_by=1, progress=True):
    continue

100%|████████████████████████████| 500/500 [07:51<00:00,  1.06it/s]


In [57]:
paramsNames = ['ln(P)', 'ln(a)', 'b', '$ln(\gamma)$']
nburn = 250
plotChains(sampler, nburn, paramsNames)
plt.tight_layout()

<IPython.core.display.Javascript object>

**Problem 6h**

Make a corner plot of the samples. What is the marginalized estimate for the period of this source? 

How does this estimate compare to LS?

In [53]:
samples[:,0]

array([-0.30777019, -0.30777037, -0.30777037, ..., -0.30777173,
       -0.30777173, -0.30777149])

In [54]:
samples = sampler.chain[:, nburn:, :].reshape((-1, ndim))
plot_samples = samples.copy()
plot_samples[:,0] = np.exp(samples[:,0])
fig = corner.corner(plot_samples, labels=paramsNames, quantiles=[0.16,0.50,0.84])

<IPython.core.display.Javascript object>

In [56]:
p16, p50, p84 = np.percentile(samples[:,0], [16,50,84])

print('ln(P) = {:.6f} +{:.6f} -{:.6f}'.format(p50, p84-p50, p50-p16))

print('GP Period = {:.6f}'.format(np.exp(p50)))

ln(P) = -0.307769 +0.000001 -0.000001
GP Period = 0.735085


The cell below shows marginalized samples overplotted on the actual data. How well does the model perform?

In [64]:
fig, ax = plt.subplots(figsize=(12,6))
ax.errorbar(lc['hjd'], lc['mag'], lc['mag_unc'], fmt='o')
ax.set_xlabel('HJD (d)')
ax.set_ylabel('mag')

hjd_grid = np.linspace(4790, 4850,5000)

for s in samples[np.random.randint(len(samples), size=5)]:
    # Set up the GP for this sample.
    lnper, lna, b, lngamma = s
    gp = george.GP(np.exp(lna) * kernels.ExpSine2Kernel(np.exp(lngamma), lnper))
    gp.compute(lc['hjd'], lc['mag_unc'])
    # Compute the prediction conditioned on the observations and plot it.
    m = gp.sample_conditional(lc['mag'] - model(s, lc['hjd']), hjd_grid) + model(s, hjd_grid)
    
    ax.plot(hjd_grid, m, color="0.2", alpha=0.3)
ax.set_xlim(4803, 4832)
ax.set_ylim(11.35, 10.8)
fig.tight_layout()

<IPython.core.display.Javascript object>

Now you have the tools to fit a GP to a light and get an estimate of the best fit period (and to get an estimate of the uncertainty on that period to boot!). 

As previously noted, you should be a bit worried about "burn in" and how the walkers were initialized throughout. If you plan to use GPs to search for periods in your own work, I highly recommend you read Angus et al. 2018 on the GP periodogram. Angus et al. provide far more intelligent methods for initializing the MCMC than what is presented here.