# Kālī

Kālī models light curves, i.e. ordered lists of flux as a function of time, using stochastic processes such as Continuous-time Autoregressive Moving Average (C-ARMA) processes. 

## Concepts

Kālī consists of multiple Python sub-modules that call underlying c++ libraries using Cython. All the sub-modules are structured around two concepts: `lc` objects that represent light curves and `task` objects that take `lc` objects as inputs and perform useful work such as fitting light curves to various stochastic models. Both the `lc` and `task` objects are abstract objects that must be inherited from to make concrete instantiable sub-classes such as the `basicLC`, `externalLC`, and the `basicTask` objects.
The `libcarma` sub-module is where the abstract `lc` and `task` classes are defined. The concrete `basicLC`, `externalLC`, and `basicTask` objects are also defined in here. The two concrete classes `basicLC` and `externalLC` are designed for slightly different purposes. `basicLC`s are to be used for simulated data generated by various `task` objects while `externalLC`s are to be used when performing one-off analysis of a real light curve. Typically, if multiple real light curves are to be studied with Kālī, it is better to define custom `lc` classes such as the `sdssLC` defined in `sdss.py` and the `k2LC` defined in the `k2.py` sub-modules. Although the provided `sdssLC` and `k2LC` are trivial classes that read the sample data provided in the `examples/data` folder, more serious implementations of such classes can be sophisticated enough to go automatically fetch data hosted remotely, pre-process the data, determine measurement uncertainties from the observations etc... The idea is to let data-set specific sub-classes of `lc` specialize for the data-set being used and make the remaining package agnostic to the source of the data. Later on in this document, we will provide guidelines for creating inherited sub-classes of `lc`.

## Using Tasks

To demonstrate how these concepts interact with each other, we shall import the `libcarma` sub-module: Next, we create a new basicTask object to actually perform some tasks. Let us create a basicTask object to model and analyze C-ARMA(2,1) systems. 

In [None]:
import libcarma
newTask = libcarma.basicTask(2,1)

Task objects contain one or more C-ARMA model objects i.e. each task object is capable of being used to study multiple C-ARMA models simultaneously. The only restriction is that the C-ARMA models held by each task object have to have the same dimensionality i.e. the same p & q. Of course, this does not prevent us from having multiple task objects declared at once. By default, the task object will query your system to determine how many hardware threads (excluding Intel Hyperthreading) your CPU supports and will hold that number of C-ARMA models. However, this behavior can be changed by explicitly instructing the task constructor to create a specific number of C-ARMA models in the task. For eg.

In [None]:
anotherNewTask = libcarma.basicTask(3, 1, nthreads = 2)

The statement above has just created a task that holds 2 C-ARMA(3,1) models for use. We can also specify how the models will burn in light curves when generating simulated light curves by calling the constructor in the following manner

In [None]:
yetAnotherNewTask = libcarma.basicTask(4, 2, nthreads = 1, nburn = 100000)

We must specify a fixed C-ARMA model to be able to do useful things with our newTask. C-ARMA models are specified using a numpy array of parameters. Let us construct the actual parameter values from the roots of the AR & MA polynomials - 

In [None]:
r_1 = +0.73642081+0j
r_2 = -0.01357919+0j
m_1 = -5.83333333
sigma = 7.0e-9

Note that we have purposely made the first root positive i.e. the C-ARMA model corresponding to this set of roots will be unstable. We can use libcarma.coeffs() to compute the polynomial coefficients corresponding to these roots - 

In [None]:
import numpy as np
Theta = libcarma.coeffs(2,1,np.array([r_1, r_2, m_1, sigma]))
print Theta

which produces a vector of the AR and MA coefficients that we can use to initialize our model. Before we initialize one of the C-ARMA models in newTask, let's check to see if the parameter vector is valid (we expect that it is not!) - 

In [None]:
print newTask.check(Theta)

which is of course exactly what we should have expected since C-ARMA models have to have AR & MA roots with non-negative real parts to be stable! Let's change the value of r_1 and re-check

In [None]:
r_1 = -0.73642081+0j
Theta = libcarma.coeffs(2,1,np.array([r_1, r_2, m_1, sigma]))
print Theta
print newTask.check(Theta)

Now that we have established that we have a good C-ARMA model parameter vector, let us set the zero-th C-ARMA model in newTask to use this vector. We also need to pick a time increment, dt, between steps. We'll use the Kepler value of dt = 0.02 day 

In [None]:
dt = 0.1
newTask.set(dt, Theta, tnum = 0)

We can check to see which of the C-ARMA models held by newTask have been set -

In [None]:
newTask.list()

As can be seen above - only the first model has been set with values so far. Let's see what the various C-ARMA model system matrices are...

In [None]:
newTask.show()

As can be seen by the way we issued the previous command, we do not have to specify tnum = 0 because the default for tnum is 0. Now lets see what the long-term variance of light curves made with this model is

In [None]:
import math
math.sqrt(newTask.Sigma()[0,0])

Finally, we can simulate a light curve using this C-ARMA model and plot it...

In [None]:
newLC = newTask.simulate(150.0)
import matplotlib.pyplot as plt
plt.figure(1)
plt.plot(newLC.t, newLC.x, color = '#7570b3', zorder = 5, label = r'Intrinsic LC')
plt.xlabel(r'$t$ (MJD)')
plt.ylabel(r'$F$ (arb. units)')
plt.savefig('lc.jpg', dpi = 300)

<img src="lc.jpg">

Now this light curve has no noise. We can simulate noise by fixing two parameters - the fractional level of variability (fracIntrinsicVar = 0.15 by default) and the fractional level of the noise to the signal (fracNoiseToSignal = 0.001 by default). Once we have fixed these two parameters, we can simulate noise as follows

In [None]:
newTask.observe(newLC)

and plot the results...

In [None]:
from matplotlib import gridspec
gs = gridspec.GridSpec(1000, 1000)
fig2 = plt.figure(2)
ax2 = fig2.add_subplot(gs[:,:])
ax2.plot(newLC.t, newLC.x, color = '#7570b3', zorder = 5, label = r'Intrinsic LC')
ax2.errorbar(newLC.t, newLC.y, newLC.yerr, fmt = '.', capsize = 0, color = '#d95f02', markeredgecolor = 'none', zorder = 10, label = r'Observed LC')
ax2.set_xlabel(r'$t$ (MJD)')
ax2.set_ylabel(r'$F$ (arb. units)')

ax3 = fig2.add_subplot(gs[50:299,700:949])
ax3.locator_params(nbins = 3)
ax3.ticklabel_format(useOffset = False)
notMissingDetail = np.where(newLC.mask[10:30] == 1.0)[0] + 10
ax3.plot(newLC.t[notMissingDetail[:]], newLC.x[notMissingDetail[:]], color = '#7570b3', zorder = 15)
ax3.errorbar(newLC.t[notMissingDetail[:]], newLC.y[notMissingDetail[:]], newLC.yerr[notMissingDetail[:]], fmt = '.', capsize = 0, color = '#d95f02', markeredgecolor = 'none', zorder = 10)
ax3.set_xlabel(r'$t$ (MJD)')
ax3.set_ylabel(r'$F$ (arb. flux)')

fig2.savefig('lc+err.jpg', dpi = 300)

<img src="lc+err.jpg">

We have added an inset figure to better show what is happening to the light curve. Consider the effect of the noise. A high order C-ARMA(p,q) model with large p but small q tends to be fairly smooth. Gaussian white noise can potentially make the light curve look less smooth than it actually is! We can compute the log likelihood of this light curve modulo a factor that depends solely on the number of observations. 

In [None]:
print newTask.logLikelihood(newLC)

In a Bayesian context, we can assign a prior to the C-ARMA parameter values based on factors such as the length of the light curve and the median/mean seperation between observations etc... For example, C-ARMA model parameters with inbuilt timescales that are of significantly longer duration than the length of the observed light curve must be treated with suspicion. Note that we have to be very careful here with how we think about this. It is entirely possible to observe a light curve for too short a duration - that is a failing of the observations. In such cases, it is impossible for us to get an accurate estimate of the true timescale from such observations. But if the algorithm suggests that the actual timescale is many times longer than the duration of our observations, we must treat the result with suspicion. We can compute the log pior probability of a model given a light curve as follows

In [None]:
print newTask.logPrior(newLC)

Since this light curve is a litte over twice as long as the longest in-built timescale (about 65 days), and since the predicted variance of the light curve matches well with the variance of the light curve, etc..., the log prior is 0.0 indicating that the model is acceptable. Given the likelihood and the prior, we can compute the posterior probability using simple addition

In [None]:
print newTask.logPosterior(newLC)

How do we control the prior? The lightcurve object has three variables that are CRITICAL! - the maxSigma, the minTimescale, and the maxTimescale. The default values are conservative - maxSigma = 2.0, minTimescale  = 2.0 and maxTimescale = 0.5. How are these values used? If the sqrt(Sigma[0,0]) i.e. the theoretical asymptotic variance of the light curve is greater than maxSigma time the standard devaition of the observed light curve, we set the prior to 0 i.e the logPrior to -infinity. If any timescale corresponding to the AR coefficients is greater than maxTimescale times the duration of the observed light curve or if it is shorter than minTimescale time the smallest timestep, we set the prior to 0. This way, we can insist on the code returning sane results. If the light curve is very short, it may be helpful to turn maxSigma and maxTimescale up a bit. If the light curve is very poorly sampled, it may help to turn minTimescale down. However, loosening the priors too much can result in spurious peaks of the likelihood space being returned as optimal. Generally speaking, it is better to err on the tighter side until we have more experience with C-ARMA modelling. 