# Gaussian Process Modeling of Light Curves

In this notebook we exemplify the modeling of the light curves using a Gaussian process (GP).

#### Index<a name="index"></a>
1. [Import Packages](#imports)
2. [Load the Original Dataset](#loadData)
3. [Fit Gaussian Processes](#gps)
4. [Light Curve Visualization](#see)

## 1. Import Packages<a name="imports"></a>

In [None]:
!pip install ../snmachine/

In [None]:
import collections
import os
import pickle
import sys
import time

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

In [None]:
from snmachine import gps, sndata
from utils.plasticc_pipeline import get_directories, load_dataset

In [None]:
%config Completer.use_jedi = False  # enable autocomplete

#### Aestetic settings

In [None]:
%matplotlib inline

sns.set(font_scale=1.3, style="ticks")

## 2. Load Dataset<a name="loadData"></a>

First, **write** the path to the folder that contains the dataset we want to use, `folder_path`.

In [None]:
root_dir = '/share/hypatia/snmachine_resources/plasticc'
folder_path = os.path.join(root_dir, 'data', 'raw_data')

Then, **write** in `data_file_name` the name of the file where your dataset is saved.

In this notebook we use the dataset saved in [2_preprocess_data]().

In [None]:
data_file_name = 'example_dataset_gapless50.pckl'

Load the dataset.

In [None]:
data_path = os.path.join(folder_path, data_file_name)
dataset = load_dataset(data_path)

## 3. Fit Gaussian Processes<a name="gps"></a>

**Write** the path to the folder to save the GP files in `saved_gps_path`.

In [None]:
saved_gps_path = os.path.join(folder_path, data_file_name[:-5])

**Choose**:
- `t_min`: minimim time to evaluate the Gaussian Process Regression at.
- `t_max`: maximum time to evaluate the Gaussian Process Regression at.
- `gp_dim`: dimension of the Gaussian Process Regression. If  `gp_dim` is 1, the filters are fitted independently. If `gp_dim` is 2, the Matern kernel is used to fit light curves both in time and wavelength.
- `number_gp`: number of points to evaluate the Gaussian Process Regression at.
- `number_processes`: number of processors to use for parallelisation (**<font color=green>optional</font>**).

In [None]:
t_min = 0
t_max = 277

gp_dim = 2
number_gp = 276
number_processes = 1

In [None]:
gps.compute_gps(dataset, number_gp=number_gp, t_min=t_min, t_max=t_max, 
                gp_dim=gp_dim, output_root=saved_gps_path, 
                number_processes=number_processes)

[Go back to top.](#index)

## 4. Light Curve Visualization<a name="see"></a>

Here we show the light curve of an event and the Gaussian process used to fit it.

In [None]:
obj_show = '7033'
sndata.PlasticcData.plot_obj_and_model(dataset.data[obj_show], 
                                       dataset.models[obj_show])

[Go back to top.](#index)