# Saving and loading Datasets From YAML
This notebook showcases how TorchSig datasets can be saved as YAML files and loaded from YAML later for convenience. 
This is useful when trying to share or version control a large dataset so that it can be reproduced exactly across different teams, but without sharing large arrays of data or different dataset creation scripts.

---

## Saving to YAML
The dataset below is saved to a YAML file for use later. 
Since we give a seed to the dataset, the yaml file will store that seed and use it to reproduce the exact data every time we load from the YAML.

In [None]:
from torchsig.utils.defaults import default_dataset
from torchsig.utils.yaml import save_dataset_yaml, load_dataset_yaml

In [None]:
filepath = "./datasets/yaml_test_dataset"

In [None]:
dataset = default_dataset(seed=42, target_labels=["class_name","snr_db"], impairment_level=None) # basic default dataset used for testing
save_dataset_yaml(filepath, dataset)

In [None]:
print(next(dataset))
print(next(dataset))
print(next(dataset))

## Loading from YAML
Now we load a copy of the same dataset. Because the copy loads the same random seed from YAML, the values returned below should match the values above.

In [None]:
dataset_copy = load_dataset_yaml(filepath)

In [None]:
print(next(dataset_copy))
print(next(dataset_copy))
print(next(dataset_copy))

### NOTE:
As of torchsig 2.0.0, saving transforms via YAML is not supported, so to ensure a perfect match either the dataset must not contain transforms, or the same transforms with the same arguments must be added to the loaded dataset after loading