# Sigma_intrinsic Analysis
We introduce one parameter $\sigma_{intrinsic}$ in our spline model because the error bar of the gateway data do not properly describe the extra noise introduced by sampling on highly-fluctuating quasar light curve. The $\sigma_{intrinsic}$ enlarge the error bar by $\sigma_{new}=\sqrt{\sigma_{data}^2+\sigma_{intrinsic}^2}$
In this notebook, I am going to show how log-likelihood change given different $\sigma_{intrinsic}$. 


##  All data analysis

First, we'll import `SLTimer`, as well as a few other important commands. 

In [None]:
from __future__ import print_function
import os, urllib, numpy as np
%matplotlib inline
import desc.sltimer

%load_ext autoreload
%autoreload 2

## Data Munging

Start a timer object, download some data to use, and plot the data. 

True Time delay 

Note: -1 comes from different time delay difinition between TDC2 and PyCS.

In [None]:
truthurl = "http://www.slac.stanford.edu/~pjm/LSST/DESC/SLTimeDelayChallenge/release/tdc2/gateway/gatewaytruth.txt"
truthfile = truthurl.split('/')[-1]
if not os.path.isfile(truthfile):
    urllib.urlretrieve(truthurl, truthfile)
d = np.loadtxt(truthfile).transpose()
truth = d[0][1]
print("True Time Delays:", -1*truth)

In [None]:
timer = desc.sltimer.SLTimer()
url = "http://www.slac.stanford.edu/~pjm/LSST/DESC/SLTimeDelayChallenge/release/tdc2/gateway/tdc2-gateway-2.txt"
timer.download(url, and_read=True, format='tdc2')
timer.whiten(seasonal=False)
timer.display_light_curves()
name_data="Gateway_2_ml350_all_50_delay_chi2_1000_samples.txt"
SampleUrl="http://stanford.edu/~chto/SLTimer_TDC2_Nolensing_number_of_Knots_test/GateWay2/"

In [None]:
timer.ml_knotstep=350
timer.knotstep=50

In [None]:
import os, urllib
def getFile(knotstep):
    name=name_data.format(knotstep)
    url=SampleUrl+name
    urllib.urlretrieve(url, name)
def plot_file(timer, knotstep, batch_sigma=False, method="plot_log_file"):
    name=name_data.format(knotstep)
    print(name)
    timer.plot_likelihood_from_file(name, outName="", chisquare=True, bins=200,corner_plot=False, add_prior=True, batch_sigma=batch_sigma, method=method)
def batch_analyze(timer, knotstep, batch_sigma=False, download=True, method="plot_log_file"):
    timer.knotstep=knotstep
    timer.ml_knotstep=350
    if download:
        getFile(knotstep)
    plot_file(timer, knotstep, batch_sigma=batch_sigma, method=method)
    print("degree of freedom is : {0}".format(timer.degree_of_freedom()))
def plot_light_curve(timer, delay, knotstep,jdrange=(59500,63100)):
    timer.knotstep=knotstep
    lcs, agn = timer.compute_chisq(delay=[delay], getlcs=True)
    timer.display_light_curves(given_curve=(lcs,agn),jdrange=jdrange)
def combile_sigma_File(fileArray, outName):
    with open(outName, 'w') as outfile:
        for index,fname in enumerate(fileArray):
            with open(fname) as infile:
                for line in infile:
                    if index!=0:
                        if line[0]=='#':
                            continue
                    outfile.write(line)

get likelihood files

In [None]:
name_data_original="Gateway_2_Rescaled_{0}_ml350_all_50_delay_chi2_1000_samples.txt"
for sigma in [0,0.2,0.02,0.002,0.0002]:
    name_data=name_data_original.format(sigma)
    getFile(knotstep=50)

In [None]:
combile_sigma_File(fileArray=["Gateway_2_Rescaled_0_ml350_all_50_delay_chi2_1000_samples.txt",
                              "Gateway_2_Rescaled_0.2_ml350_all_50_delay_chi2_1000_samples.txt",
                              "Gateway_2_Rescaled_0.02_ml350_all_50_delay_chi2_1000_samples.txt",
                              "Gateway_2_Rescaled_0.002_ml350_all_50_delay_chi2_1000_samples.txt",
                              "Gateway_2_Rescaled_0.0002_ml350_all_50_delay_chi2_1000_samples.txt"], 
                   outName='combined.txt')

combined log likelihood

In [None]:
name_data="combined.txt"
batch_analyze(timer, knotstep=50, batch_sigma=True, download=False)

Sigma_init = 0.2 is too large for this figure, so I plot it individually.

In [None]:
name_data="Gateway_2_Rescaled_0.2_ml350_all_50_delay_chi2_1000_samples.txt"
batch_analyze(timer, knotstep=50, method="plot exp in same graph")

In [None]:
name_data="Gateway_2_Rescaled_0.2_ml350_all_50_delay_chi2_1000_samples.txt"
batch_analyze(timer, knotstep=50)

To compare I plot sigma=0

In [None]:
name_data="Gateway_2_Rescaled_0_ml350_all_50_delay_chi2_1000_samples.txt"
batch_analyze(timer, knotstep=50)

Another way to see this is to plot likelihood to sigma diagram given true time delay

In [None]:
def get_loglikelihood(timer, sigma, delay):
    timer = desc.sltimer.SLTimer()
    timer.download(url, and_read=True, format='tdc2')
    timer.whiten(seasonal=False)
    timer.sigma_intrinsic = sigma
    timer.rescale_noise()
    chisquare=timer.chisquare_to_loglikelihood(timer.compute_chisq([delay], batch=False, getlcs=False))
    timer.reset_noise() 
    return chisquare

In [None]:
sigma_Array=np.logspace(-2,0,50)
delays=[-40,-30,-20,-10,0,10,20,30,40]
likelihoodResult={}
for delay in delays:
    likelihood=[]
    newsigmaArray=[]
    for sigma in sigma_Array:
        try:
            likelihood.append(get_loglikelihood(timer, sigma, delay))
            newsigmaArray.append(sigma)
        except:
            print(sigma, "\n")
            continue
    likelihoodResult[delay]=[newsigmaArray, likelihood]
np.save("likelihoodResult.npy", likelihoodResult)

In [None]:
print(likelihoodResult.keys())
from matplotlib import pyplot as plt
print(min(sigma_Array))
for number, delay in enumerate(delays):
    print(number)
    likelihood_delay=np.array(likelihoodResult[delay])
    plt.plot(likelihood_delay[0], -np.log(-likelihood_delay[1]),"-", label=str(delay))
plt.ylabel("-log(-log(likelihood))")
plt.xlabel('sigma_instrinsic')
plt.legend()

We see the peak is at 0.2, which is consitant with what we saw.

Conclusion:
1. sigma=0.2 do improve the fitting. 
2. the posterior  of sigma =0.2 recover the true time delay
3. I am still worried about why the likelihood for sigma=0.2 do not recover the true time delay (In fact even get worse compring to sigma=0)