# Create SnanaData

In this notebook we exemplify how to load data into `snmachine`. For that we will create an instance of the `SnanaData` class from `.csv` files.

#### Index<a name="index"></a>
1. [Import packages](#imports)
2. [Dataset paths](#paths)
3. [Create SnanaData instance](#createZtf)
4. [Save SnanaData instance](#save)
    1. [Load SnanaData instance](#load) <font color=salmon>(Optional)</font>

## 1. Import packages<a name="imports"></a>

In [None]:
!pip install ../snmachine/

In [None]:
import os
import pickle
import time

In [None]:
import numpy as np
import pandas as pd

In [None]:
from snmachine import sndata
from utils.plasticc_pipeline import get_directories, load_dataset

In [None]:
%config Completer.use_jedi = False  # enable autocomplete

## 2. Dataset paths<a name="paths"></a>

First, we need to **write** the path to the folder where the dataset and metadata are, `folder_path`.

In [None]:
#os_name = 'baseline_v2_0_paper'
#os_name = 'noroll_v2_0_paper'
#os_name = 'presto_v2_0_paper'

folder_path = '/folder/path'

Then, **write** the name of the dataset and its metadata, respectively `data_file_name` and `metadata_file_name`.

In [None]:
extra_name_to_save = 'ddf'
#extra_name_to_save = 'wfd'

name = 'train'
#name = 'test'

file_id = '000'
#file_id = '012' # until 012

data_file_name = f'file_{name}_{extra_name_to_save}_{file_id}.pckl'
metadata_file_name = f'file_{name}_{extra_name_to_save}_metadata_{file_id}.pckl'

## 3. Create SnanaData instance<a name="createSnana"></a>

We now create a `SnanaData` instance. The different datasets take different ammount of time to run.

In [None]:
dataset = sndata.SnanaData(folder=folder_path, data_file=data_file_name,
                           metadata_file=metadata_file_name, survey_name='lsst',
                           pb_wavelengths=sndata.default_pb_wavelengths['lsst'])

Add a metadata entry to say it is DDF or WFD.

In [None]:
is_ddf = extra_name_to_save == 'ddf'
dataset.metadata['ddf'] = is_ddf

In [None]:
metadata = dataset.metadata

See the first entries of the metadata.

In [None]:
dataset.metadata.head(3)

In [None]:
metadata['original_target'] = metadata.target

In [None]:
ia_labels = [10]
ibc_labels = [20, 21, 25, 26, 27]
ii_labels = [30, 31, 32, 35, 36, 37]
sl_labels = [40]
bg_labels = [11]
ax_labels = [12]

In [None]:
is_ia = np.isin(metadata.target, ia_labels) # 90
is_ibc = np.isin(metadata.target, ibc_labels) # 62
is_ii = np.isin(metadata.target, ii_labels) # 42
is_sl = np.isin(metadata.target, sl_labels) # 95
is_bg = np.isin(metadata.target, bg_labels) # 67
is_ax = np.isin(metadata.target, ax_labels) # 52

In [None]:
metadata['target'][is_ia] = 90
metadata['target'][is_ibc] = 62
metadata['target'][is_ii] = 42
metadata['target'][is_sl] = 95
metadata['target'][is_bg] = 67
metadata['target'][is_ax] = 52

In [None]:
dataset.metadata = metadata

## 4. Save SnanaData instance<a name="save"></a>

Now, **choose** a path to save the `SnanaData` instance created (`folder_path_to_save`) and the name of the file (`file_name`).

In [None]:
folder_path_to_save = folder_path
if name == 'test' and extra_name_to_save == 'wfd':
    file_name = f'{name}_{extra_name_to_save}_{file_id}.pckl'
    print(file_name)
else:
    file_name = f'{name}_{extra_name_to_save}.pckl'

In [None]:
folder_path_to_save = folder_path
file_name = f'{name}_{extra_name_to_save}_{file_id}.pckl'
file_name

Finally, save the `SnanaData` instance.

In [None]:
path_to_save = os.path.join(folder_path_to_save, file_name)
print(f'File to save in {path_to_save}')

In [None]:
time_start_saving = time.time()
with open(path_to_save, 'wb') as f:
    pickle.dump(dataset, f, pickle.HIGHEST_PROTOCOL)
a = time.time() - time_start_saving
print(f'{time.time() - time_start_saving}s')

### 4.1. Load SnanaData instance<a name="load"></a> <font color=salmon>(Optional)</font>

We can load the saved file to verify weather it was correctly saved.

In [None]:
time_start_saving = time.time()
saved_dataset = load_dataset(path_to_save)
print(f'{time.time() - time_start_saving}s')

In [None]:
metadata = saved_dataset.metadata

As we can see, the metadata is the same.

In [None]:
numerical_cols = ['hostgal_photoz', 'hostgal_photoz_err']

In [None]:
np.allclose(saved_dataset.metadata[numerical_cols], 
            dataset.metadata[numerical_cols])

[Go back to top.](#index)