# Preprocess dataset

In this notebook we exemplify the ligh curve preprocessing described in [Link to paper]().
For that we use the SNe Ia, SN Ibc and SN II of the PLAsTiCC dataset. **references**.

#### Index<a name="index"></a>
1. [Import packages](#imports)
2. [Load the original dataset](#loadData)
3. [Preprocess light curves](#preprocess)
4. [Save processed PlasticcData](#saveData)
5. [Light curve comparison](#comparison)

## 1. Import packages<a name="imports"></a>

In [None]:
!pip install ../snmachine/

In [None]:
import os
import pickle
import sys

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
from snmachine import sndata
from utils.plasticc_pipeline import get_directories, load_dataset

In [None]:
%config Completer.use_jedi = False  # enable autocomplete

#### Aestetic settings

In [None]:
%matplotlib inline

sns.set(font_scale=1.3, style="ticks")

## 2. Load original dataset<a name="loadData"></a>

First, **write** the path to the dataset folder `folder_path`.

In [None]:
root_dir = '/share/hypatia/snmachine_resources/plasticc'
folder_path = os.path.join(root_dir, 'data', 'raw_data')

Then, **write** in `data_file_name` the name of the file where your dataset is saved.

In this notebook we use the dataset created in [1_load_data]().

In [None]:
data_file_name = 'example_dataset.pckl'

Load the dataset.

In [None]:
data_path = os.path.join(folder_path, data_file_name)
dataset = load_dataset(data_path)

Save the data of one event to later compare on the Section [Light curve transformation](#transformation). **Choose** the event by modifying `obj_show`.

In [None]:
obj_show = '7033'
obs_before = dataset.data[obj_show].to_pandas()

## 3. Preprocess light curves<a name="preprocess"></a>

**Write** the maximum duration of the gap to allowed in the light curves, `max_gap_length`.

In [None]:
max_gap_length = 50

To remove all the gaps longer than `max_gap_length`, the `remove_gaps` function must be called a few times; it only removes the first gap longer than `max_gap_length`.

To introduce uniformity in the dataset, the resulting light curves are tranlated so their first observation is at time zero.

In [None]:
dataset.remove_gaps(max_gap_length*2, verbose=True)
dataset.remove_gaps(max_gap_length*2, verbose=True)
dataset.remove_gaps(max_gap_length, verbose=True)
dataset.remove_gaps(max_gap_length, verbose=True)
dataset.remove_gaps(max_gap_length, verbose=True)

In [None]:
print(f'The longest processed light curve has {dataset.get_max_length():.2f} days.')

## 4. Save processed PlasticcData<a name="saveData"></a>

Now, **chose** a path to save the PlasticcData instance created (`folder_path_to_save`) and the name of the file (`file_name`).

In [None]:
folder_path_to_save = folder_path
file_name = data_file_name[:-5]+'_gapless50.pckl'
file_name

In [None]:
with open(os.path.join(folder_path_to_save, file_name), 'wb') as f:
    pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)

[Go back to top.](#index)

## 5. Light curve comparison<a name="comparison"></a>

Here we show the difference between one original light curve and the transformed one.

In [None]:
obs_after = dataset.data[obj_show]

In [None]:
sndata.plot_lc(obs_before)
plt.axvspan(xmin=729, xmax=849, color='gray', alpha=.3)

In [None]:
sndata.plot_lc(obs_after, False)

[Go back to top.](#index)