# Gaussian Process Modeling of Light Curves

In this notebook we exemplify the modeling of the light curves using a Gaussian process (GP).

#### Index<a name="index"></a>
1. [Import Packages](#imports)
2. [Load the Original Dataset](#loadData)
3. [Fit Gaussian Processes](#gps)
    1. [Set Path to Save GP Files](#saveGps)
    2. [Compute GP Fits](#makeGps)
4. [Light Curve Visualization](#see)

## 1. Import Packages<a name="imports"></a>

In [None]:
import collections
import os
import pickle
import sys
import time

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

In [None]:
from snmachine import gps, sndata
from utils.plasticc_pipeline import create_folder_structure, get_directories, load_dataset

In [None]:
%config Completer.use_jedi = False  # enable autocomplete

#### Aestetic settings

In [None]:
%matplotlib inline

sns.set(font_scale=1.3, style="ticks")

## 2. Load Dataset<a name="loadData"></a>

First, **write** the path to the folder that contains the dataset we want to use, `folder_path`.

In [None]:
# os_name = 'baseline_v2_0_paper'
os_name = 'noroll_v2_0_paper'
# os_name = 'presto_v2_0_paper'

folder_path = f'/path/to/folder'

Then, **write** in `data_file_name` the name of the file where your dataset is saved.

In this notebook we use the dataset saved in [2_preprocess_data](2_preprocess_data.ipynb).

In [None]:
is_only_roll = 1
is_updated = 1

In [None]:
#extra_name_to_save = 'ddf'
extra_name_to_save = 'wfd'
#extra_name_to_save = 'ddf_wfd'

# name = 'train'
name = 'test'

# file_id = '000'
file_id = '012' # until 012

data_file_name = f'{name}_{extra_name_to_save}_{file_id}_gapless50.pckl'
if is_only_roll:
    data_file_name = f'{name}_{extra_name_to_save}_{file_id}_roll_gapless50.pckl'
if is_updated:
    data_file_name = data_file_name[:-5] + '_updated.pckl'
data_file_name

Load the dataset.

In [None]:
time_start = time.time()  
data_path = os.path.join(folder_path, data_file_name)
dataset = load_dataset(data_path)
print(f'{time.time() - time_start}s')

In [None]:
dataset.get_max_length()

## 3. Fit Gaussian Processes<a name="gps"></a>

### 3.1. Set Path to Save GP Files<a name="saveGps"></a>

We can now generate a folder structure to neatly save the files. Otherwise, you can directly write the path to the folder to save the GP files in `saved_gps_path`.

**<font color=Orange>A)</font>** Generate the folder structure.

**Write** the name of the folder you want in `analysis_name`. 

In [None]:
analysis_name = data_file_name[:-5]
analysis_name

In [None]:
folder_analysis_path = folder_path[:-14] + 'analyses'

Create the folder structure.

In [None]:
create_folder_structure(folder_analysis_path, analysis_name)

See the folder structure.

In [None]:
directories = get_directories(folder_analysis_path, analysis_name) 
directories

Set the path to the folder to save the GP files.

In [None]:
path_saved_gps = directories['intermediate_files_directory']

**<font color=Orange>B)</font>** Directly choose where to save the GP files.

**Write** the path to the folder to save the GP files in `saved_gps_path`.

```python
saved_gps_path = os.path.join(folder_path, data_file_name[:-5])
```

### 3.2. Compute GP Fits<a name="makeGps"></a>

**Choose**:
- `t_min`: minimim time to evaluate the Gaussian Process Regression at.
- `t_max`: maximum time to evaluate the Gaussian Process Regression at.
- `gp_dim`: dimension of the Gaussian Process Regression. If  `gp_dim` is 1, the filters are fitted independently. If `gp_dim` is 2, the Matern kernel is used to fit light curves both in time and wavelength.
- `number_gp`: number of points to evaluate the Gaussian Process Regression at.
- `number_processes`: number of processors to use for parallelisation (**<font color=green>optional</font>**).

In [None]:
dataset.get_max_length()

In [None]:
t_min = 0
t_max = 295 # all paper datasets with same range; new wrong procedure

gp_dim = 2
number_gp = 292 # all paper datasets with same number of points

number_processes = 1

In [None]:
gps.compute_gps(dataset, number_gp=number_gp, t_min=t_min, t_max=t_max, 
                gp_dim=gp_dim, output_root=path_saved_gps, 
                number_processes=number_processes)

In [None]:
ini_time = time.time()
good_objs = []
for obj in dataset.object_names:
    obj_data = dataset.data[obj]
    if np.sum(obj_data['detected']) > 0 and len(obj_data['mjd']) > 1:
        good_objs.append(obj)
time_taken = time.time() - ini_time
print(time_taken)

In [None]:
if len(good_objs) != len(dataset.object_names):
    print('trimming bad events')
    ini_time = time.time()
    dataset.update_dataset(good_objs)
    dataset.update_dataset(list(dataset.metadata.index))
    time_taken = time.time() - ini_time
    print(time_taken)
    
    ini_time = time.time()
    with open(data_path, 'wb') as f:
        pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)
    time_taken = time.time() - ini_time
    print(time_taken)
else:
    print('good number')
    print(len(good_objs))
    print(len(dataset.object_names))

[Go back to top.](#index)

## 4. Light Curve Visualization<a name="see"></a>

Here we show the light curve of an event and the Gaussian process used to fit it.

In [None]:
dataset.object_names

In [None]:
# WFD
#obj_show = '670865' # base train
#obj_show = '111116031' # base test 000
obj_show = '8580232' # base test 001
#obj_show = '93626702' # base test 012

#obj_show = '670865' # no roll train ; by coincidence it is the same as baseline
#obj_show = '123728213' # no roll test 000

#obj_show = '706524' # presto train
#obj_show = '27091144' # presto test 000
sndata.PlasticcData.plot_obj_and_model(dataset.data[obj_show], 
                                       dataset.models[obj_show])

[Go back to top.](#index)