# Load dataset into `snmachine`

In this notebook we exemplify how to load data into `snmachine`. For that we will create an instance of the `PlasticcData` class from `.csv` files.

#### Index<a name="index"></a>
1. [Import packages](#imports)
2. [Dataset paths](#paths)
3. [Create PlasticcData instance](#createPlasticc)
    1. [Select a subset](#subset) <font color=salmon>(Optional)</font>
4. [Save PlasticcData instance](#save)
    1. [Load PlasticcData instance](#load) <font color=salmon>(Optional)</font>
5. [Repeat for test dataset](#repeat)

## 1. Import packages<a name="imports"></a>

In [None]:
import os
import pickle
import sys

In [None]:
import numpy as np

In [None]:
from snmachine import sndata
from utils.plasticc_pipeline import get_directories, load_dataset

In [None]:
%config Completer.use_jedi = False  # activate autocomplete 

## 2. Dataset paths<a name="paths"></a>

First, we need to **write** the path to the folder where the dataset and metadata are, `folder_path`.

In [None]:
folder_path = '../snmachine/example_data'

Then, **write** the name of the dataset and its metadata, respectively `data_file_name` and `metadata_file_name`.

In [None]:
data_file_name = 'plasticc_train_lightcurves.csv'
metadata_file_name = 'plasticc_train_metadata.csv'

## 3. Create PlasticcData instance<a name="createPlasticc"></a>

We now create a `PlasticcData` instance. The following cell takes $\sim2$min to run.

In [None]:
dataset = sndata.PlasticcData(folder=folder_path, data_file=data_file_name,
                              metadata_file=metadata_file_name)

See the first entries of the metadata.

In [None]:
dataset.metadata.head(10)

### 3.1 Select a subset<a name="subset"></a> <font color=salmon>(Optional)</font>

Sometimes we want a subset of the dataset. Here we illustrate how to generate a `PlasticcData` instance of that subset.

In this example, we choose 90 SNe among SN Ia, SN Ibc and SN II. See `note2_modelNames` in [Zenodo](https://zenodo.org/record/2539456#.YGM6R2RKjAM) for the mapping between the classes numbers and names.

**Replace** the above step with your chosen subset or use all events.

In [None]:
metadata = dataset.metadata
is_snia = metadata.target == 90  # SN Ia
is_snibc = metadata.target == 62  # SN Ibc
is_snii = metadata.target == 42  # SN II

In [None]:
np.random.seed(42)  # for reproducibility 

objs_to_keep = []
for is_sn in [is_snia, is_snibc, is_snii]:
    objs_to_keep.append(np.random.choice(a=metadata['object_id'][is_sn], 
                                         size=30, replace=False))
objs_to_keep = np.array(objs_to_keep).flatten()

In [None]:
print(f'We keep {len(objs_to_keep)} events.')

Update the dataset.

In [None]:
dataset.update_dataset(objs_to_keep)

Notice how the first entries of the metadata changed; now we only have 90 events.

In [None]:
dataset.metadata

## 4. Save PlasticcData instance<a name="save"></a>

Now, **choose** a path to save the `PlasticcData` instance created (`folder_path_to_save`) and the name of the file (`file_name`).

In [None]:
folder_path_to_save = folder_path
file_name = 'example_dataset.pckl'

Finally, save the `PlasticcData` instance.

In [None]:
path_to_save = os.path.join(folder_path_to_save, file_name)
with open(path_to_save, 'wb') as f:
    pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)

### 4.1 Load PlasticcData instance<a name="load"></a> <font color=salmon>(Optional)</font>

We can load the saved file to verify weather it was correctly saved.

In [None]:
saved_dataset = load_dataset(path_to_save)

As we can see, the metadata is the same.

In [None]:
np.allclose(np.array(saved_dataset.metadata, dtype=float), 
            np.array(dataset.metadata, dtype=float))

## 5. Repeat for test dataset<a name="repeat"></a>

Here we will load an example test set that already only contains SN Ia, SN Ibc and SN II.

First, we need to **write** the path to the folder where the dataset and metadata are, `folder_path`.

In [None]:
folder_path = '../snmachine/example_data'

Then, **write** the name of the dataset and its metadata, respectively `data_file_name` and `metadata_file_name`.

In [None]:
data_file_name = 'sniabcii_test_lightcurves_example.csv'
metadata_file_name = 'sniabcii_test_metadata_example.csv'

We now create a `PlasticcData` instance. The following cell takes $\sim1$min to run.

In [None]:
dataset = sndata.PlasticcData(folder=folder_path, data_file=data_file_name,
                              metadata_file=metadata_file_name)

Now, **choose** a path to save the `PlasticcData` instance created (`folder_path_to_save`) and the name of the file (`file_name`).

In [None]:
folder_path_to_save = folder_path
file_name = 'example_test_dataset.pckl'

Finally, save the `PlasticcData` instance.

In [None]:
path_to_save = os.path.join(folder_path_to_save, file_name)
with open(path_to_save, 'wb') as f:
    pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)

[Go back to top.](#index)