# Fast GP implementations

In [None]:
%matplotlib inline

In [None]:
%config InlineBackend.figure_format = 'retina'

In [None]:
from matplotlib import rcParams
rcParams["figure.dpi"] = 100
rcParams["figure.figsize"] = 12, 4

## Benchmarking GP codes
Implemented the right way, GPs can be super fast! Let's compare the time it takes to evaluate our GP likelihood and the time it takes to evaluate the likelihood computed with the snazzy ``george`` and ``celerite`` packages. We'll learn how to use both along the way. Let's create a large, fake dataset for these tests:

In [None]:
import numpy as np

np.random.seed(0)
t = np.linspace(0, 10, 10000)
y = np.random.randn(10000)
sigma = np.ones(10000)

### Our GP

In [None]:
def ExpSquaredCovariance(t, A=1.0, l=1.0, tprime=None):
    """
    Return the ``N x M`` exponential squared
    covariance matrix.
    
    """
    if tprime is None:
        tprime = t
    TPrime, T = np.meshgrid(tprime, t)
    return A ** 2 * np.exp(-0.5 * (T - TPrime) ** 2 / l ** 2)


def ln_gp_likelihood(t, y, sigma=0, A=1.0, l=1.0):
    """
    Return the log of the GP likelihood for a datatset y(t)
    with uncertainties sigma, modeled with a Squared Exponential
    Kernel with amplitude A and lengthscale l.
    
    """
    # The covariance and its determinant
    npts = len(t)
    K = ExpSquaredCovariance(t, A=A, l=l) + sigma ** 2 * np.eye(npts)
    
    # The log marginal likelihood
    log_like = -0.5 * np.dot(y.T, np.linalg.solve(K, y))
    log_like -= 0.5 * np.linalg.slogdet(K)[1]
    log_like -= 0.5 * npts * np.log(2 * np.pi)
    
    return log_like

Time to evaluate the GP likelihood:

In [None]:
%%time
ln_gp_likelihood(t, y, sigma)

### george

Let's time how long it takes to do the same operation using the ``george`` package (``pip install george``).

The kernel we'll use is

```python
kernel = amp ** 2 * george.kernels.ExpSquaredKernel(tau ** 2)
```

where ``amp = 1`` and ``tau = 1`` in this case.

To instantiate a GP using ``george``, simply run

```python
gp = george.GP(kernel)
```

The ``george`` package pre-computes a lot of matrices that are re-used in different operations, so before anything else, we'll ask it to compute the GP model for our timeseries:

```python
gp.compute(t, sigma)
```

Note that we've only given it the time array and the uncertainties, so as long as those remain the same, you don't have to re-compute anything. This will save you a lot of time in the long run!

Finally, the log likelihood is given by ``gp.log_likelihood(y)``.

How do the speeds compare? Did you get the same value of the likelihood?

In [None]:
import george

In [None]:
%%time
kernel = george.kernels.ExpSquaredKernel(1.0)
gp = george.GP(kernel)
gp.compute(t, sigma)

In [None]:
%%time
print(gp.log_likelihood(y))

``george`` also offers a fancy GP solver called the HODLR solver, which makes some approximations that dramatically speed up the matrix algebra. Let's instantiate the GP object again by passing the keyword ``solver=george.HODLRSolver`` and re-compute the log likelihood. How long did that take? Did we get the same value for the log likelihood?

In [None]:
%%time
gp = george.GP(kernel, solver=george.HODLRSolver)
gp.compute(t, sigma)

In [None]:
%%time
gp.log_likelihood(y)

### celerite

The ``george`` package is super useful for GP modeling, and I recommend you read over the [docs and examples](https://george.readthedocs.io/en/latest/). It implements several different [kernels](https://george.readthedocs.io/en/latest/user/kernels/) that come in handy in different situations, and it has support for multi-dimensional GPs. But if all you care about are GPs in one dimension (in this case, we're only doing GPs in the time domain, so we're good), then ``celerite`` is what it's all about:

```bash
pip install celerite
```

Check out the [docs](https://celerite.readthedocs.io/en/stable/) here, as well as several tutorials. There is also a [paper](https://arxiv.org/abs/1703.09710) that discusses the math behind ``celerite``. The basic idea is that for certain families of kernels, there exist **extremely efficient** methods of factorizing the covariance matrices. Whereas GP fitting typically scales with the number of datapoints $N$ as $N^3$, ``celerite`` is able to do everything in order $N$ (!!!) This is a **huge** advantage, especially for datasets with tens or hundreds of thousands of data points. Using ``george`` or any homebuilt GP model for datasets larger than about ``10,000`` points is simply intractable, but with ``celerite`` you can do it in a breeze.

Next we repeat the timing tests, but this time using ``celerite``. Note that the Exponential Squared Kernel is not available in ``celerite``, because it doesn't have the special form needed to make its factorization fast. Instead, we'll use the ``Matern 3/2`` kernel, which is qualitatively similar and can be approximated quite well in terms of the ``celerite`` basis functions:

```python
kernel = celerite.terms.Matern32Term(np.log(1), np.log(1))
```

Note that ``celerite`` accepts the **log** of the amplitude and the **log** of the timescale. Other than this, we can compute the likelihood using the same syntax as ``george``.

How much faster did it run? Is the value of the likelihood different from what you found above? Why?

In [None]:
import celerite
from celerite import terms

In [None]:
%%time
kernel = terms.Matern32Term(np.log(1), np.log(1))
gp = celerite.GP(kernel)
gp.compute(t, sigma)

In [None]:
%%time
gp.log_likelihood(y)

<div style="background-color: #D6EAF8; border-left: 15px solid #2E86C1;">
    <h1 style="line-height:2.5em; margin-left:1em;">Exercise (the one and only)</h1>
</div>

Let's use what we've learned about GPs in a real application: fitting an exoplanet transit model in the presence of correlated noise.

Here is a (fictitious) light curve for a star with a transiting planet: 

In [None]:
import matplotlib.pyplot as plt

t, y, yerr = np.loadtxt("data/sample_transit.txt", unpack=True)
plt.errorbar(t, y, yerr=yerr, fmt=".k", capsize=0)
plt.xlabel("time")
plt.ylabel("relative flux");

There is a transit visible to the eye at $t = 0$, which (say) is when you'd expect the planet to transit if its orbit were perfectly periodic. However, a recent paper claims that the planet shows transit timing variations, which are indicative of a second, perturbing planet in the system, and that a transit at $t = 0$ can be ruled out at 3 $\sigma$. **Your task is to verify this claim.**

Assume you have no prior information on the planet other than the transit occurs in the observation window, the depth of the transit is somewhere in the range $(0, 1)$, and the transit duration is somewhere between $0.1$ and $1$ day. You don't know the exact process generating the noise, but you are certain that there's correlated noise in the dataset, so you'll have to pick a reasonable kernel and estimate its hyperparameters.


Fit the transit with a simple inverted Gaussian with three free parameters:

```python
def transit_shape(depth, t0, dur):
    return -depth * np.exp(-0.5 * (t - t0) ** 2 / (0.2 * dur) ** 2)
```

*HINT: I borrowed heavily from [this tutorial](https://celerite.readthedocs.io/en/stable/tutorials/modeling/) in the celerite documentation, so you might want to take a look at it...*