# Simulate Boyajian's Star in ZTF

First, let's start with a simple simulation of Boyajian's star in ZTF:

I'm going to 
1) download the Kepler light curve for Boyajian's Star
2) download some random ZTF light curves
3) try to use the ZTF time stamps to simulate some light curves of the existing Kepler data, but with a ZTF cadence

In the second step:
4) make sure the error bars match those of ZTF for a given magnitude

In the third step, expand the simulations:
5) move the bright dips around in the ZTF observation time
6) make light curves with different median magnitude (and corresponding error bars --> make a scatter plot of light curve magnitudes and their uncertainties, define a Gaussian with the right mean and variance)
7) Maybe define a more general model for Boyajian's star-like light curves and simulate from that?

#### Open Questions
* Do we need to restrict ourselves in galactic latitude? Start by pulling $N$ random objects from across the sky

## Imports

In [1]:
%matplotlib notebook
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")

import glob

import numpy as np
import pandas as pd
import lightkurve

This call to matplotlib.use() has no effect because the backend has already
been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.

The backend was *originally* set to 'nbAgg' by the following code:
  File "/astro/users/dhuppenk/.conda/envs/dataviz/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/astro/users/dhuppenk/.conda/envs/dataviz/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/astro/users/dhuppenk/.conda/envs/dataviz/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/astro/users/dhuppenk/.conda/envs/dataviz/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/astro/users/dhuppenk/.conda/envs/dataviz/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 486, in start
    self.io_loop.start()
  Fil

## Getting the Kepler Data

We're going to get the Kepler data using the `lightkurve` package:

In [2]:
from lightkurve import search_lightcurvefile
target = 'KIC 8462852'

Let's try and download the data for all quarters. 

In [3]:
lc = search_lightcurvefile(target, quarter=1).download().PDCSAP_FLUX
for q in range(2,18):
    lc = lc.append(search_lightcurvefile(target, quarter=q).download().PDCSAP_FLUX)

In [4]:
lc.scatter();

<IPython.core.display.Javascript object>

The `normalize` function normalizes the different quarters to the same mean magnitude:

In [5]:
lc = search_lightcurvefile(target, quarter=1).download().PDCSAP_FLUX.normalize()
for q in range(2,18):
    lc = lc.append(search_lightcurvefile(target, quarter=q).download().PDCSAP_FLUX.normalize())

In [6]:
lc.scatter();

<IPython.core.display.Javascript object>

Let's save this to a file:

In [7]:
!pwd

/data/epyc/users/dhuppenk/repositories/ZTF_Boyajian/notebooks


In [8]:
np.savetxt("./boyajians_star_kepler_normalized.dat", np.array([lc.time, lc.flux]).T)

In [9]:
fig, ax = plt.subplots(1, 1, figsize=(10,4))
ax.scatter(lc.time, -2.5*np.log10(lc.flux)+10, c="black", s=5)
ax.invert_yaxis()

<IPython.core.display.Javascript object>

  


### Function to get the Kepler data

Let's make a function to get the Kepler light curve:

In [10]:
def get_kepler_lightcurve():
    target = 'KIC 8462852'
    lc = search_lightcurvefile(target, quarter=1).download().PDCSAP_FLUX.normalize()
    for q in range(2,18):
        lc = lc.append(search_lightcurvefile(target, quarter=q).download().PDCSAP_FLUX.normalize())
        
    return lc

In [11]:
kepler_lc = get_kepler_lightcurve()

In [12]:
kepler_lc = kepler_lc.remove_nans()

## ZTF Cadences

Kyle made a pickle file with some ZTF cadence information:

In [13]:
datadir = "/epyc/data/boyajian/"

pickle_file = "sample_mjds_2.pkl"

In [14]:
ztf_cadence = pd.read_pickle(datadir + pickle_file)

In [15]:
ztf_cadence.head()

Unnamed: 0,mjd_g,mag_g,magerr_g,mjd_r,mag_r,magerr_r
0,"[58204.201331, 58206.1338889, 58450.4353819, 5...","[18.049786, 17.98969, 17.893917, 17.893661, 17...","[0.043785494, 0.042514425, 0.04061343, 0.04060...","[58475.45729171111, 58450.3772454, 58475.45774...","[17.445889, 17.433268, 17.45091, 17.46019, 17....","[0.029131362, 0.029901987, 0.029236319, 0.0294..."
1,"[58208.4476505, 58208.4842824, 58234.3558912, ...","[16.587982, 16.418648, 16.603691, 16.678661, 1...","[0.027895203, 0.027219111, 0.02796397, 0.02830...","[58502.56362271111, 58502.56866901111, 58502.5...","[15.69869, 15.667656, 15.66051, 15.677812, 15....","[0.012726132, 0.012664719, 0.012650874, 0.0148..."
2,"[58598.42826391111, 58598.42871531111, 58246.3...","[20.503447, 20.104498, 20.468098, 20.198317, 2...","[0.1468122, 0.12067488, 0.120341174, 0.1270227...","[58694.26076391111, 58723.21578701111, 58217.4...","[19.5084, 19.513474, 19.416594, 19.455902, 19....","[0.08006927, 0.08037927, 0.08319566, 0.0769301..."
3,"[58205.4712847, 58246.3551505, 58728.162881911...","[19.52711, 19.600018, 19.734375, 19.756207, 19...","[0.07429419, 0.07745236, 0.097713016, 0.084637...","[58694.26076391111, 58723.21578701111, 58666.3...","[18.070387, 18.075138, 18.075306, 18.28212, 18...","[0.029745415, 0.029832937, 0.029836038, 0.0340..."
4,"[58599.37923613333, 58235.417419, 58274.377963...","[20.6547, 19.874813, 20.611324, 20.21209, 20.3...","[0.14792758, 0.09857254, 0.13027763, 0.1135262...","[58700.21489581111, 58686.27434031111, 58700.2...","[19.047123, 19.245966, 19.162663, 19.18438, 19...","[0.06769094, 0.07629721, 0.07256523, 0.0735206..."


Okay, so that's a dataframe where each row is a list of MJDs. How long is this dataframe?

In [16]:
len(ztf_cadence)

32789

Haha, cool, we have lots of cadence information to play with. Let's pick one at random:

In [17]:
np.random.seed(1000)

In [18]:
n_cadence = len(ztf_cadence)

In [19]:
idx = np.random.randint(0, n_cadence)
print(idx)

4695


In [20]:
def get_ztf_lightcurve(ztf_df, idx):
    """
    Get a dictionary with ZTF points in r and g bands out 
    of the cadence data frame
    
    Parameters
    ----------
    ztf_df : pd.DataFrame
        A DataFrame with the ZTF data, has columns `mjd_g`, `mag_g`, 
        `magerr_g`, `mjd_r`, `mag_r`, `magerr_r`
        
    idx: int
        An index in ztf_df.index to choose a particular light curve
        
    Returns
    -------
    ztf_lc : dict
        A dictionary with the data in a given row of the DataFrame
    """
    
    ztf_line = ztf_df.loc[idx]
    mjd_g = ztf_line[0]
    mag_g = ztf_line[1]
    magerr_g = ztf_line[2]
    g_idx = np.argsort(mjd_g)
    mjd_g = mjd_g[g_idx]
    mag_g = mag_g[g_idx]
    magerr_g = magerr_g[g_idx]
    
    mjd_r = ztf_line[3]
    mag_r = ztf_line[4]
    magerr_r = ztf_line[5]

    r_idx = np.argsort(mjd_r)
    mjd_r = mjd_r[r_idx]
    mag_r = mag_r[r_idx]
    magerr_r = magerr_r[r_idx]
    
    tseg_g = mjd_g.max() - mjd_g.min()
    tseg_r = mjd_r.max() - mjd_r.min()

    ztf_lc = {"mjd_g": mjd_g, "mag_g": mag_g, "magerr_g": magerr_g,
              "mjd_r": mjd_r, "mag_r": mag_r, "magerr_r": magerr_r,
              "tseg_g": tseg_g, "tseg_r": tseg_r,
              "ng": len(mjd_g), "nr": len(mjd_r), 
              "zero_g": mjd_g[0], "zero_r":mjd_r[0]}
    
    return ztf_lc

In [21]:
ztf_lc = get_ztf_lightcurve(ztf_cadence, idx)

In [22]:
ztf_lc["mjd_g"]

array([58202.30657411, 58202.30752311, 58202.3202894 , 58202.32253471,
       58202.32303241, 58202.34365741, 58202.34765051, 58202.3484259 ,
       58202.36528931, 58202.3660417 , 58202.36766201, 58202.3705324 ,
       58203.36204861, 58203.36871531, 58203.38998841, 58203.39092591,
       58203.41055551, 58204.30612271, 58204.32546301, 58204.34760421,
       58204.35298611, 58204.37420141, 58204.37607641, 58204.37748841,
       58204.39862271, 58204.41739581, 58205.29771991, 58205.32548611,
       58205.33863421, 58205.3509259 , 58205.36042821, 58205.37143521,
       58205.37618051, 58205.3769213 , 58205.3797222 , 58205.38172451,
       58205.39694441, 58205.4159144 , 58205.45837961, 58205.48208331,
       58206.27326391, 58206.30195601, 58206.32442131, 58206.32643521,
       58206.35291671, 58206.35432871, 58206.35524301, 58206.35667821,
       58206.39197921, 58206.39391201, 58206.41677081, 58206.43549771,
       58207.30568291, 58207.30725691, 58207.32320601, 58207.32803241,
      

In [23]:
ztf_lc["ng"]

723

Ok, cool. 

How long is this light curve in days?

In [24]:
print("Total length in g band: " + str(ztf_lc["tseg_g"]))
print("Total length in r band: " + str(ztf_lc["tseg_r"]))

Total length in g band: 531.9119906999986
Total length in r band: 535.9404398999977


Let's also write a function to read a ZTF light curve from a CSV file:

In [25]:
ztf_boyajian = np.zeros((len(ztf_lc["mjd_g"]), 3))

kepler_time_normalized = kepler_lc.time - kepler_lc.time[0]

for i in range(ztf_lc["ng"]):
    t = ztf_lc["mjd_g"][i] - ztf_lc["zero_g"]
    f = ztf_lc["mag_g"][i]
    fe = ztf_lc["magerr_g"][i]
    
    idx = kepler_time_normalized.searchsorted(t)
    #print(idx)
    
    ztf_boyajian[i,0] = lc.time[idx]
    ztf_boyajian[i,1] = lc.flux[idx]
    ztf_boyajian[i,2] = lc.flux_err[idx]

In [27]:
fig, ax = plt.subplots(1, 1, figsize=(10,4))

sns.rugplot(ztf_lc["mjd_g"] - ztf_lc["zero_g"], lw=3, color="red", ax=ax)
ax.scatter(kepler_time_normalized, kepler_lc.flux, s=5, c="black")
ax.scatter(ztf_boyajian[:,0] - kepler_lc.time[0], ztf_boyajian[:,1], s=5, c="red")

<IPython.core.display.Javascript object>

<matplotlib.collections.PathCollection at 0x7f5637727be0>

Let's make a function to automate these simulations:

In [28]:
def simulate_boyajian(kepler_lc, ztf_lc, start_point=None, 
                      use_mean=True):
    """
    Simulate Boyajian's star's Kepler light curve in a ZTF cadence.
    
    This function uses the (normalized) Kepler light curve of 
    Boyajian's star, and observed time stamps of ZTF observations, to 
    simulate what the Kepler light curve would look like in a ZTF 
    cadence. Right now, it only simulates fluxes, not flux errors. 
    
    Parameters
    -----------
    kepler_lc : lightkurve.LightCurve object
        A Lightcurve object with the Kepler light curve.
        
    ztf_lc : dict
        A dictionary with data from a ZTF light curve, generated by 
        `get_ztf_lightcurve` function.
        
    start_point : {None, "random"}, default None
        If None, start at the beginning of the Kepler light curve. 
        If "random", start somewhere randomly in the Kepler light curve.
        
    use_mean : boolean
        If True, scale the Kepler flux for Boyajian's star using the 
        mean and errors of the real ZTF light curve.

    Returns:
    --------
    ztf_boyajian : lightkurve.LightCurve object
        A LightCurve object with the flux from Boyajian's star as 
        measured by Kepler, but using data points derived from a real 
        ZTF cadence.
    """
    # make an empty array for the output light curve
    ztf_boyajian_g = np.zeros((ztf_lc["ng"], 3))
    ztf_boyajian_r = np.zeros((ztf_lc["nr"], 3))

    # normalize time array of the Kepler and ZTF data to zero
    kepler_time_normalized = kepler_lc.time - kepler_lc.time[0]
    
    mean_ztf_mag_g = np.mean(ztf_lc["mag_g"])
    #print("mean ZTF magnitude in g-band: " + str(mean_ztf_mag_g))

    mean_ztf_mag_r = np.mean(ztf_lc["mag_r"])
    #print("mean ZTF magnitude in r-band: " + str(mean_ztf_mag_r))

    if start_point is None:
        ztf_time_g_normalized = ztf_lc["mjd_g"] - ztf_lc["zero_g"]
        ztf_time_r_normalized = ztf_lc["mjd_r"] - ztf_lc["zero_r"]

    elif start_point == "random":
        # calculate the length of the ZTF light curve
        ztf_tseg = np.max([ztf_lc["tseg_g"], ztf_lc["tseg_r"]])
        random_point = np.random.uniform(0, kepler_time_normalized[-1] - ztf_tseg)
        ztf_time_g_normalized = ztf_lc["mjd_g"] - ztf_lc['zero_g'] + random_point
        ztf_time_r_normalized = ztf_lc["mjd_r"] - ztf_lc['zero_r'] + random_point

    else:
        raise ValueError("start_point must be either None or 'random'")
     
    # let's assign points for the g-band
    for i in range(ztf_lc["ng"]):
        t = ztf_time_g_normalized[i]
        #f = ztf_lc["mag_g"][i]
        fe = ztf_lc["magerr_g"][i]
        
        idx = kepler_time_normalized.searchsorted(t)
        
        # time stamps and flux come from the Kepler data
        ztf_boyajian_g[i,0] = kepler_lc.time[idx]
        
        if use_mean: 
            ztf_boyajian_g[i,1] = kepler_lc.flux[idx] + mean_ztf_mag_g
        else:
            ztf_boyajian_g[i,1] = kepler_lc.flux[idx]
    
        # flux_err comes from ZTF data
        ztf_boyajian_g[i,2] = fe
    
    meta_g = {"ztf_mean_mag": mean_ztf_mag_g}
    ztf_boyajian_g = lightkurve.LightCurve(time=ztf_boyajian_g[:,0], 
                                           flux=ztf_boyajian_g[:,1], 
                                           flux_err=ztf_boyajian_g[:,2],
                                           meta=meta_g)

    # same for the r-band:
    for i in range(ztf_lc["nr"]):
        t = ztf_time_r_normalized[i]
        fe = ztf_lc["magerr_r"][i]
        
        idx = kepler_time_normalized.searchsorted(t)
        
        # time stamps and flux come from the Kepler data
        ztf_boyajian_r[i,0] = kepler_lc.time[idx]
        
        if use_mean: 
            ztf_boyajian_r[i,1] = kepler_lc.flux[idx] + mean_ztf_mag_r
        else:
            ztf_boyajian_r[i,1] = kepler_lc.flux[idx]
    
        # flux_err comes from ZTF data
        ztf_boyajian_r[i,2] = fe
    
    meta_r = {"ztf_mean_mag": mean_ztf_mag_r}

    ztf_boyajian_r = lightkurve.LightCurve(time=ztf_boyajian_r[:,0], 
                                           flux=ztf_boyajian_r[:,1], 
                                           flux_err=ztf_boyajian_r[:,2],
                                           meta=meta_r)


    return ztf_boyajian_g, ztf_boyajian_r

Let's try this, too:

In [29]:
ztf_boyajian_g, ztf_boyajian_r = simulate_boyajian(kepler_lc, ztf_lc, 
                                                   start_point="random")

In [30]:
ztf_boyajian_r

<lightkurve.lightcurve.LightCurve at 0x7f563760f908>

In [31]:
ztf_boyajian_g

<lightkurve.lightcurve.LightCurve at 0x7f570a942eb8>

Now let's make a function that plots the Kepler light curve and overplots the 
ZTF cadence version:

In [66]:
def plot_cadence(kepler_lc, ztf_boyajian_r=None, ztf_boyajian_g=None, band="both", ax=None):
    """
    Plot the Kepler light curve and overplot the version generated 
    using a ZTF cadence for comparison and diagnostics.
    
    Parameters
    ----------
    kepler_lc : lightkurve.LightCurve object
        A Lightcurve object with the Kepler light curve.
        
    ztf_boyajian_r, ztf_boyajian_g : lightkurve.LightCurve object
        A LightCurve object with the flux from Boyajian's star as 
        measured by Kepler, but using data points derived from a real 
        ZTF cadence in either r- or g-band.

    band : str, {"r", "g", "both"}
        Determines which band to plot. Options are to plot r-band only, 
        g-band only or both.

    ax : matplotlib.pyplot.Axes object
        An Axes object to plot into
        
    """
    
    r_colour = sns.color_palette("colorblind", n_colors=7)[1]
    g_colour = sns.color_palette("colorblind", n_colors=7)[2]
    
    if ax is None:
        fig, ax = plt.subplots(1, 1, figsize=(10,5))
    
    ax.scatter(kepler_lc.time - kepler_lc.time[0], kepler_lc.flux, 
               s=5, c="black", label="Kepler light curve")

    if band == "r" or band == "both":
        assert ztf_boyajian_r is not None, "Need a Lightcurve object to plot r-band magnitudes!"

        sns.rugplot(ztf_boyajian_r.time - kepler_lc.time[0], lw=1, 
                color=r_colour, ax=ax)
        ax.scatter(ztf_boyajian_r.time - kepler_lc.time[0], 
               ztf_boyajian_r.flux-ztf_boyajian_r.meta["ztf_mean_mag"], 
               s=5, c=r_colour, label="simulated r-band points")
    
    if band == "g" or band == "both":
        assert ztf_boyajian_g is not None, "Need a Lightcurve object to plot g-band magnitudes!"

        sns.rugplot(ztf_boyajian_g.time - kepler_lc.time[0], lw=1, 
                color=g_colour, ax=ax)
        ax.scatter(ztf_boyajian_g.time - kepler_lc.time[0], 
               ztf_boyajian_g.flux-ztf_boyajian_g.meta["ztf_mean_mag"], 
               s=5, c=g_colour, label="simulated g-band points")
    
    ax.set_xlim(0, kepler_lc.time[-1]-kepler_lc.time[0])
    ax.legend()
    
    return ax

In [67]:
fig, ax = plt.subplots(1, 1, figsize=(10,5))

plot_cadence(kepler_lc, ztf_boyajian_r, ztf_boyajian_g, "both", ax=ax)



<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x7f562f4b7a90>

### More Light Curves

Let's do this for some other light curves:

In [34]:
idx_set = np.random.choice(np.arange(0,len(ztf_cadence), step=1), replace=False, size=10)

for idx in idx_set:
    ztf_lc = get_ztf_lightcurve(ztf_cadence, idx)
    ztf_boyajian_g, ztf_boyajian_r = simulate_boyajian(kepler_lc, ztf_lc, 
                                                   start_point="random")
    fig, ax = plt.subplots(1, 1, figsize=(10,5))

    plot_cadence(kepler_lc, ztf_boyajian_r, ztf_boyajian_g, "both", ax=ax)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Okay, let's simulate some light curves doing this:

In [35]:
np.random.seed(100)

# number of simulations to run
nsims = 50000

# number of available cadences from 
# example light curves
ncadences = len(ztf_cadence)

# empty list for example light curves
ztf_boyajian_r_examples, ztf_boyajian_g_examples = [], []

for i in range(nsims):
    # randomly select a data file to simulate a cadence from
    df_idx = np.random.randint(0, ncadences-1)
    
    # get the ZTF light curve out of the DataFrame
    ztf_lc = get_ztf_lightcurve(ztf_cadence, df_idx)
    
    # simulate an example
    ztf_boyajian_g, ztf_boyajian_r = simulate_boyajian(kepler_lc, ztf_lc, 
                                                   start_point="random")

    # let's save this in a list
    ztf_boyajian_g_examples.append(ztf_boyajian_g)
    ztf_boyajian_r_examples.append(ztf_boyajian_r)
    

Let's plot some examples:

In [36]:
example_idx = np.random.choice(np.arange(nsims, dtype=int), replace=False, size=10)

for idx in example_idx:
    fig, ax = plt.subplots(1, 1, figsize=(10,5))
    plot_cadence(kepler_lc, ztf_boyajian_r_examples[idx], ztf_boyajian_g_examples[idx], "both", ax=ax)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Let's figure out how often in my light curves do I see a Boyajian star-like dip:

In [40]:
flare_present_r, flare_present_g = [], []

for example_lc_r, example_lc_g in zip(ztf_boyajian_r_examples, ztf_boyajian_g_examples):
    #print((example_lc_r.flux - example_lc_r.meta["ztf_mean_mag"]))
    b = np.any((example_lc_r.flux - example_lc_r.meta["ztf_mean_mag"]) < 0.8)
    flare_present_r.append(b)

In [41]:
len(ztf_boyajian_r_examples)

50000

In [43]:
np.sum(flare_present_r)

155

In [45]:
ztf_boyajian_r_examples = np.array(ztf_boyajian_r_examples)
ztf_boyajian_g_examples = np.array(ztf_boyajian_g_examples)

In [47]:
ztf_with_dips_r = ztf_boyajian_r_examples[flare_present_r]
ztf_with_dips_g = ztf_boyajian_g_examples[flare_present_r]

In [48]:
ztf_with_dips_npoints = np.zeros(len(ztf_with_dips_r))
for i,ztf_lc in enumerate(ztf_with_dips_r):
    ztf_with_dips_npoints[i] = len(ztf_lc.time)

In [49]:
len(ztf_with_dips_npoints)

155

Looks like most of them are from high-cadence data:

In [63]:
len(np.argwhere(ztf_with_dips_npoints > 500))

45

Only a few data sets have more than 500 data points

In [68]:
for lc_ztf_r, lc_ztf_g in zip(ztf_with_dips_r[(ztf_with_dips_npoints > 1000)][:10],ztf_with_dips_g[(ztf_with_dips_npoints > 1000)][:10]):
    print(len(lc_ztf_r.time))
    fig, ax = plt.subplots(1, 1, figsize=(10,5))
    plot_cadence(kepler_lc, ztf_boyajian_r=lc_ztf_r, ztf_boyajian_g=lc_ztf_g, band="both", ax=ax)

1200




<IPython.core.display.Javascript object>

1235


<IPython.core.display.Javascript object>

1013


<IPython.core.display.Javascript object>