# SLTimer Analysis on TDC2 Data

This notebook shows the analysis on TDC2 data with the use of SLTimer. We try to explore PyCS code to understand what combination of parameter could be best to make PyCS fit data well. 
Two parameters we could tune are knotsteps for micro-lensing spline function and knotsteps for intrinsic spline function. To simplify the analysis, we set the knotstep for micro-lensing spline function to be 350 days, because micro-lensing does not likely cause variability within a year. With the above concern, in this notebook, we try to understand which knotstep parameter we should choose for intrinsic spline function to get a good fit and it's impact on recovered time-delay. 
In this notebook, we first do the analysis on all the data. We then do the same thing with i-band data only, because one possible extra noise can come from the different gain in each bands. 

##  All data analysis

First, we'll import `SLTimer`, as well as a few other important commands. 

In [None]:
from __future__ import print_function
import os, urllib, numpy as np
%matplotlib inline
import desc.sltimer

%load_ext autoreload
%autoreload 2

## Data Munging

Start a timer object, download some data to use, and plot the data. 

In [None]:
timer = desc.sltimer.SLTimer()
url = "http://www.slac.stanford.edu/~pjm/LSST/DESC/SLTimeDelayChallenge/release/tdc2/gateway/tdc2-gateway-1.txt"
timer.download(url, and_read=True, format='tdc2')
timer.display_light_curves(jdrange=(59500,63100))

True Time delay

In [None]:
truthurl = "http://www.slac.stanford.edu/~pjm/LSST/DESC/SLTimeDelayChallenge/release/tdc2/gateway/gatewaytruth.txt"
truthfile = truthurl.split('/')[-1]
if not os.path.isfile(truthfile):
    urllib.urlretrieve(truthurl, truthfile)
d = np.loadtxt(truthfile).transpose()
truth = d[0]
print("True Time Delays:", truth)

## Preprocessing SLTimer

We offset the light curves to a common mean to get a set of points that look more like they were taken in one filter.

In [None]:
timer.whiten()

We set knotsteps of micro lensing spline function to 350 because we think micro lensing should not vary very fast. 

In [None]:
timer.ml_knotstep=350

## Data analyzing

We're now ready to analyze this data. 

In the following code, we are generating likelihood function for different knotstep.  

We generate the likelihood function by regularly sampling time-delay from -120 days to 120 days. For each sample, we shift the curve accordingly, use free spline function to fit the light curve (at constant time delay.) and compute the chisquare.

For each run, we change the knotstep parameter on spline function used to fit quasar. The way we fit the quasar is a three stage way, which are 
pycs.spl.topopt.opt_rough(nit=5, knotstep=5/2knotstep), 
pycs.spl.topopt.opt_rough(nit=5, knotstep=3/2knotstep), 
and pycs.spl.topopt.opt_fine(nit=10, knotstep=knotstep)

In [None]:
out_dir="./"
def Run(knotstep):
    nsample=1000
    sample=np.linspace(-120,120,nsample).reshape(-1,1)
    timer.knotstep=knotstep
    timer.compute_likelihood_simpleMC(nsample=nsample, nprocess=2,
                                    rangeList=None, outName=out_dir+"test_nolensing_notsStep{0}".format(knotstep),
                                    save_file=True, samples=sample)

In [None]:
#Run(20)
#This function could will generate a file with the first column be time-delay and second column be chisquare. 
#However since the process will take about an hour to run, I suggest you to download the files I've already generated

The link to the files I generated with Run function.

In [None]:
SampleUrl="http://stanford.edu/~chto/SLTimer_TDC2_sample/lensing_350_all/"
name_data="ml350_all_{0}_delay_chi2_1000_samples.txt"

define functions to download and plot files

In [None]:
import os, urllib
def getFile(knotstep):
    name=name_data.format(knotstep)
    url=SampleUrl+name
    urllib.urlretrieve(url, name)
def plot_file(timer, knotstep):
    name=name_data.format(knotstep)
    print(name)
    timer.plot_likelihood_from_file(name, outName="", chisquare=True, bins=200,corner_plot=False)

define the function to plot chisquare for a given knotstep and also print out degree of freedom

In [None]:
def batch_analyze(timer, knotstep):
    timer.knotstep=knotstep
    timer.ml_knotstep=350
    getFile(knotstep)
    plot_file(timer, knotstep)
    print("degree of freedom is : {0}".format(timer.degree_of_freedom()))

define the function to plot light curve given a time delay

In [None]:
def plot_light_curve(timer, delay, knotstep):
    timer.knotstep=knotstep
    lcs, agn = timer.compute_chisq(delay=[delay], getlcs=True)
    timer.display_light_curves(given_curve=(lcs,agn))

plot chisquare distribution when knotstep parameter = 20 

In [None]:
batch_analyze(timer, knotstep=20)

Note:
the minimum chisquare is above 0.2*1e7 and number of data is 2012, which means on average the fitted curve is (0.2*1e7/2012)^(1/2) = 31.52 sigma away from data. 

plot the fitted curve and data at true answer (ie delay=55)

In [None]:
plot_light_curve(timer, delay=55, knotstep=20)

Repeat the above analysis  while changing knotstep = 50 and 70
Note: Here, I only show 50 and 70 to demostrate, but I actually do the analysis with knotstep = 20, 30, 40, 50, 60, 70, 80, 90, 100. 

In [None]:
batch_analyze(timer, knotstep=50)

In [None]:
plot_light_curve(timer, delay=55, knotstep=50)

In [None]:
batch_analyze(timer, 70)

In [None]:
plot_light_curve(timer, 55, 70)

The above analysis shows that PyCS will not be able to recover the true time delay. One thing could go wrong is the gain for data in different bands might not be the same and introduce extra structure to the light curve. Therefore to simlify the problem, in the follwoing I'll perfom the same analysis on i band data. 

## i band data analysis

In [None]:
timer_i = desc.sltimer.SLTimer()
url = "http://www.slac.stanford.edu/~pjm/LSST/DESC/SLTimeDelayChallenge/release/tdc2/gateway/tdc2-gateway-1.txt"
timer_i.download(url, and_read=True, format='tdc2')
timer_i.select_bands('i')
timer_i.display_light_curves(jdrange=(59500,63100))
name_data="ml350_i_{0}_delay_chi2_1000_samples.txt"
SampleUrl="http://stanford.edu/~chto/SLTimer_TDC2_sample/lensing_350_iband/"

In [None]:
timer_i.ml_knotstep=350

In [None]:
batch_analyze(timer_i, knotstep=20)
plot_light_curve(timer_i, delay=55, knotstep=20)

In [None]:
batch_analyze(timer_i, knotstep=50)
plot_light_curve(timer_i, delay=55, knotstep=50)

In [None]:
batch_analyze(timer_i, knotstep=70)
plot_light_curve(timer_i, delay=55, knotstep=70)

As you see, chisquare goes very low at high time delay, no matter what knotsteps we put in. The reason for this is because for long time delay, the data will not overlap with each other  and help spline function to fit the data. 

## Conclusion: 
1. The fitted light curve seems not fitting the data very well, given the value of minimum chisquare and number of data. 
For example, if we fitted on all the data with knotsteps=20, the lowest chisquare is 0.21e7 and number of data is 2012. This implies on average the fitted curve is (0.21e7/2012)^(1/2) = 31.52 sigma away from data. If we fitting only on i band data with knotstep=20, the lowest chisquare is 20000 and number of data is 312. This implies on average the fitted curve is (20000/312)^(1/2) = 8 sigma away from data.

2. From chisquare to time delay plots, we know our current fitting method will not be able to recover the true time delay. Some additional process is needed. 

To further analyze conclusion number 1, we can test the method on tutorial light curve and calculate degree of freedom and number of data.

In [None]:
timer_tutorial = desc.sltimer.SLTimer()
url = "https://raw.githubusercontent.com/COSMOGRAIL/PyCS/master/demo/demo1/data/trialcurves.txt"
timer_tutorial.download(url, and_read=True)
timer_tutorial.display_light_curves()

In [None]:
timer_tutorial.ml_knotstep=150
timer_tutorial.knotstep=20
timer_tutorial.add_spline_microlensing()
timer_tutorial.optimize_spline_model()
timer_tutorial.degree_of_freedom()

We can see the chisquare for true answer is 2159.8 but number of data is 768. This is weird because the fitted light curve is on average (2159/768)^(1/2)= 1.67 sigma away from the data. 